I am a Researcher at the Natural Language and Speech Group (NLX) of the Department of Informatics of the University of Lisbon, Faculty of Sciences, and the CTO of the PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language.
pointers = { "email": "luismsgomes@gmail.com", "orcid": "https://orcid.org/0000-0003-3119-4189", "scholar": "http://scholar.google.com/citations?user=wiQsf7MAAAAJ", "publons": "https://publons.com/researcher/2776308/luis-gomes/", "github": "https://github.com/luismsgomes", }
Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support António Branco, João Ricardo Silva, Luís Gomes, João António Rodrigues in Proceedings of The 13th Language Resources and Evaluation Conference (pdf, bibtex)
A Shared Task of a New, Collaborative Type to foster Reproducibility: A first exercise in the area of language science and technology with REPROLANG2020 António Branco, Nicoletta Calzolari, Piek Vossen, Gertjan van Noord, Dieter van Uytvanck, João Silva, Luís Gomes, André Moreira, Willem Elbers in Proceedings of The 12th Language Resources and Evaluation Conference (pdf, bibtex)
Infrastructure for the Science and Technology of Language PORTULAN CLARIN António Branco, Amália Mendes, Paulo Quaresma, Luís Gomes, João Silva, Andrea Teixeira in Proceedings of the 1st International Workshop on Language Technology Platforms (pdf, bibtex)
ELRI: A Decentralised Network of National Relay Stations to Collect, Prepare and Share Language Resources Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Martínez Garcia, José Luis Fonseca, Patricia Fonseca, Paulo Vale, Jane Dunne, Federico Gaspari, Teresa Lynn, Helen McHugh, Andy Way, Victoria Arranz, Khalid Choukri, Hervé Pusset, Alexandre Sicard, Rui Neto, Maite Melero, David Perez, António Branco, Ruben Branco, Luís Gomes in Proceedings of the 1st International Workshop on Language Technology Platforms (pdf, bibtex)
Exploring the Relevance of Bilingual Morph-units in Automatic Induction of Translation Templates Kavitha Mahesh, Luı́s Gomes, and Gabriel Pereira Lopes in Advances in Artificial Intelligence — IBERAMIA 2018, 13–16 November 2018, Trujillo, Peru (pdf, bibtex)
Setting up the PORTULAN / CLARIN repository Luı́s Gomes, Frederico Apolónia, Ruben Branco, João Silva and António Branco in Proceedings of CLARIN Annual Conference 2018, 8-10 October 2018, Pisa, Italy (full proceedings pdf, pdf, bibtex, poster)
ELRI - European Language Resource Infrastructure Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Martı́nez Garcia, Paulo Vale, José Luis Fonseca, Teresa Lynn, Jane Dunne, Federico Gaspari, Andy Way, Victoria Arranz, Khalid Choukri, Vladimir Popescu, Pedro Neiva, Rui Neto, Maite Melero, David Perez, António Branco, Ruben Branco, and Luı́s Gomes in Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 351 (full proceedings pdf, pdf, bibtex, poster)
Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow Luı́s Gomes PhD Thesis, December 2017, Universidade Nova de Lisboa (pdf, bibtex)
Using Bilingual Segments in Generating Word-to-word Translations Kavitha Mahesh, Luı́s Gomes and Gabriel Pereira Lopes in Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra-6), December 2016, Osaka, Japan (pdf, bibtex)
First Steps Towards Coverage-based Document Alignment Luís Gomes and Gabriel Pereira Lopes in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany) (pdf, bibtex)
English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach José Aires, Gabriel Pereira Lopes and Luís Gomes in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany) (pdf, bibtex)
SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Luís Gomes, João António Rodrigues, Steven Neale, João Silva, Andreia Querido, Nuno Rendeiro and António Branco in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany) (pdf, bibtex)
First Steps Towards Coverage-based Sentence Alignment Luís Gomes and Gabriel Pereira Lopes in Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 25-27 May 2016, Portorož (Slovenia) (pdf, bibtex, code)
Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models Steven Neale, Luís Gomes, Eneko Agirre, Oier Lopez de Lacalle and António Branco in Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 25-27 May 2016, Portorož (Slovenia) (pdf, bibtex)
Seeking to Reproduce “Easy Domain Adaptation” Luís Gomes, Gertjan van Noord, António Branco and Steven Neale in 4REAL – Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language, collocated with LREC 2016, 28 May 2016, Portorož (Slovenia) (pdf, bibtex)
Domain-Specific Hybrid Machine Translation from English to Portuguese João Rodrigues, Luís Gomes, Steven Neale, Andreia Querido, Nuno Rendeiro, Sanja Štajner, João Silva and António Branco in PROPOR 2016 – International Conference on the Computational Processing of the Portuguese Language, 13-15 July 2016, Tomar (Portugal), Springer (pdf, bibtex)
Learning Clusters of Bilingual Suffixes using Bilingual Translation Lexicon Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Mining Intelligence and Knowledge Exploration (MIKE 2015), Hyderabad, India, December 2015, Springer (pdf, bibtex)
Bilingually motivated segmentation and generation of word translations using relatively small translation data sets Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 29), Shanghai, China, October 2015 (pdf, bibtex)
New Language Pairs in TectoMT Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel and Rudolf Rosa in Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT15), Lisboa, Portugal, September 2015, ACL (pdf, bibtex, poster)
Bootstrapping a Hybrid Deep MT System João Silva, João Rodrigues, Luís Gomes and António Branco in Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), Beijing, China, July 2015, ACL (pdf, bibtex)
Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies Sanja Štajner, João Rodrigues, Luís Gomes and António Branco in 1st Deep MT Workshop (DMTW), Prague, Czech Republic, September 2015 (pdf, bibtex)
First Steps in Using Word Senses as Contextual Features in Maxent Models for Machine Translation Steven Neale, Luís Gomes and António Branco in 1st Deep MT Workshop (DMTW), Prague, Czech Republic, September 2015 (pdf, bibtex)
Improving bilingual search performance using compact full-text indices Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo in Computational Linguistics and Intelligent Text Processing, 16th International Conference, CICLing 2015, Cairo, Egypt, April 2015, Springer (pdf, bibtex)
Selecting Translation Candidates for Parallel Corpora Alignment Kavitha Mahesh, Luís Gomes, José Aires and Gabriel Pereira Lopes in Progress in Artificial Intelligence, 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 2015, Springer (pdf, bibtex)
Identification of Bilingual Segments for Translation Generation Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Advances in Intelligent Data Analysis XIII, 13th International Symposium, IDA 2014, Leuven, Belgium, October 2014, Springer (pdf, bibtex)
Identification of Bilingual Suffix Classes for Classification and Translation Generation Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Advances in Artificial Intelligence, 14th Ibero-American Conference on AI, IBERAMIA 2014, Santiago de Chile, Chile, November 2014, Springer (pdf, bibtex)
Compact and Fast Indexes for Translation Related Tasks Jorge Costa, Luís Gomes, Gabriel Pereira Lopes, Luís Russo and Nieves Brisaboa in Progress in Artificial Intelligence, 16th Portuguese Conference in Artificial Intelligence, EPIA 2013, Açores, Portugal, September 2013, Springer (pdf, bibtex)
Measuring Spelling Similarity for Cognate Identification Luís Gomes and Gabriel Pereira Lopes in Progress in Artificial Intelligence, 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011, Springer (pdf, bibtex, code)
Using SVMs for Filtering Translation Tables for Parallel Texts Alignment Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Proceedings of 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011 (pdf, bibtex)
Managing and Querying a Bilingual Lexicon with Suffix Trees Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo in Proceedings of 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011 (pdf, bibtex)
Representing a Bilingual Lexicon with Suffix Trees Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo in Proceedings of 26th Symposium On Applied Computing (SAC 2011), Taichung, Taiwan, March 2011, ACM (pdf, bibtex)
Parallel Texts Alignment Luís Gomes, José Aires, and Gabriel Pereira Lopes in New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October, 2009 (pdf, bibtex)
Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures José Aires, Gabriel Pereira Lopes, and Luís Gomes in Progress in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 2009, Springer (pdf, bibtex)
Parallel Texts Alignment Luís Gomes Master Thesis, February, 2009, Universidade Nova de Lisboa (pdf, bibtex)
Extracção supervisionada de léxicos bilingues a partir de corpora paralelos at Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa: Jornadas dos Dicionários: Lexicografia e Dicionarística Portuguesas, Lisboa, Portugal, 11 and 12 July 2016.
Translation Services: Perspectives of a Portuguese Technological Start-Up at Universidade da Beira Interior: XXV Jornadas de informática do Núcleo de Informática da UBI, Covilhã, Portugal, 14 May 2014.
Translation Alignment and Translation Extraction at Université de Caen Basse Normandie, France, 13 Septembre 2012.
Setting up the PORTULAN / CLARIN repository at the CLARIN Annual Conference 2018, 8-10 October 2018, Pisa, Italy. (pdf)
ELRI - European Language Resource Infrastructure at the 21st Annual Conference of the European Association for Machine Translation (EAMT 2018), from 28 to 30 May 2018 in Alacant/Alicante, Spain. (pdf)
Identification Of Bilingual Segments for Translation Generation at the Thirteenth International Symposium on Intelligent Data Analysis (IDA 2014), Leuven, from 30 October to 1 November 2014. (pdf)
This section contains mostly old stuff. Newer software is available from my GitHub page.
multiwords (version 3) extracts multiword units (MWUs) from raw text.
spsim is measure for spelling similarity that helps identifying possible cognates.
unl-aligner aligns similarly spelled words (proper nouns, numbers, punctuation, cognates, etc) in parallel texts.
hindi_stemmer is a Python implementation of the stemming algorithm described in "A Lightweight Stemmer for Hindi" by Ananthakrishnan Ramanathan and Durgesh D Rao.
czech_stemmer is a stemmer for Czech that I ported to Python from the Java implementation by Ljiljana Dolamic, University of Neuchatel.
croatian_stemmer is a stemmer for Croatian developed by Nikola Ljubešić and Ivan Pandžić. I modified it slightly for allowing usage as a python module. The original implementation is available here.
snowball_stemmer a basic command line program (reads stdin, writes stdout) based on libstemmer (it supports Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, and Turkish).
mosestokenizer is a Python package that provides wrappers for some pre-processing Perl scripts from the Moses toolkit (tokenizer, sentence splitter and punctuation normalizer).
stringology is a Python package that implements several classical string algorithms.
dicionário terminológico (DT) is a Portuguese linguistic terminology dictionary.
Luís Gomes luismsgomes@gmail.com http://research.variancia.com/ This page is valid HTML5.