I am a Researcher at the Natural Language and Speech Group (NLX) of the Department of Informatics of the University of Lisbon, Faculty of Sciences, and the CTO of the PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language.
pointers = { "scholar": "http://scholar.google.com/citations?user=wiQsf7MAAAAJ", "github": "https://github.com/luismsgomes", "email": "luismsgomes@gmail.com", }
Open Sentence Embeddings for Portuguese with the Serafim PT* encoders family (accepted) Luı́s Gomes, António Branco, João Ricardo Silva, João Rodrigues and Rodrigo Santos (to appear) in Proceedings of the 23th Edition of the EPIA Conference
Fostering the Ecosystem of Open Neural Encoders for Portuguese with Albertina PT* Family Rodrigo Santos, João Rodrigues, Luı́s Gomes, João Ricardo Silva, António Branco, Henrique Lopes Cardoso, Tomás Freitas Osório, Bernardo Leite in Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024 (pdf, bibtex)
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT* Rodrigo Santos, João Ricardo Silva, Luı́s Gomes, João Rodrigues, António Branco in Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024 (pdf, bibtex)
Advancing Neural Encoding of Portuguese with Transformer Albertina PT-* João Rodrigues, Luı́s Gomes, João Silva, António Branco, Rodrigo Santos, Henrique Lopes Cardoso, and Tomás Osório in Progress in Artificial Intelligence. Cham: Springer Nature Switzerland, 2023, pp. 441–453 (pdf, preprint pdf, bibtex)
Open and Inclusive Language Processing: Language Processing Services by PORTULAN to Meet the Widest Needs of CLARIN users Luís Gomes, Ruben Branco, João Silva, and António Branco in CLARIN: The Infrastructure for Language Resources Berlin, Boston: De Gruyter, 2022. (pdf, bibtex)
Where do I Belong in Six Centuries of Literature? João Silva , Sara Grilo , Márcia Bolrinha , Rodrigo Santos , Luís Gomes , António Branco and Rui Vaz in CLARIN: The Infrastructure for Language Resources Berlin, Boston: De Gruyter, 2022. (pdf, bibtex)
Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support António Branco, João Ricardo Silva, Luís Gomes, João António Rodrigues in Proceedings of The 13th Language Resources and Evaluation Conference (pdf, bibtex)
A Shared Task of a New, Collaborative Type to foster Reproducibility: A first exercise in the area of language science and technology with REPROLANG2020 António Branco, Nicoletta Calzolari, Piek Vossen, Gertjan van Noord, Dieter van Uytvanck, João Silva, Luís Gomes, André Moreira, Willem Elbers in Proceedings of The 12th Language Resources and Evaluation Conference (pdf, bibtex)
Infrastructure for the Science and Technology of Language PORTULAN CLARIN António Branco, Amália Mendes, Paulo Quaresma, Luís Gomes, João Silva, Andrea Teixeira in Proceedings of the 1st International Workshop on Language Technology Platforms (pdf, bibtex)
ELRI: A Decentralised Network of National Relay Stations to Collect, Prepare and Share Language Resources Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Martínez Garcia, José Luis Fonseca, Patricia Fonseca, Paulo Vale, Jane Dunne, Federico Gaspari, Teresa Lynn, Helen McHugh, Andy Way, Victoria Arranz, Khalid Choukri, Hervé Pusset, Alexandre Sicard, Rui Neto, Maite Melero, David Perez, António Branco, Ruben Branco, Luís Gomes in Proceedings of the 1st International Workshop on Language Technology Platforms (pdf, bibtex)
Exploring the Relevance of Bilingual Morph-units in Automatic Induction of Translation Templates Kavitha Mahesh, Luı́s Gomes, and Gabriel Pereira Lopes in Advances in Artificial Intelligence — IBERAMIA 2018, 13–16 November 2018, Trujillo, Peru (pdf, bibtex)
Setting up the PORTULAN / CLARIN repository Luı́s Gomes, Frederico Apolónia, Ruben Branco, João Silva and António Branco in Proceedings of CLARIN Annual Conference 2018, 8-10 October 2018, Pisa, Italy (full proceedings pdf, pdf, bibtex, poster)
ELRI - European Language Resource Infrastructure Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Martı́nez Garcia, Paulo Vale, José Luis Fonseca, Teresa Lynn, Jane Dunne, Federico Gaspari, Andy Way, Victoria Arranz, Khalid Choukri, Vladimir Popescu, Pedro Neiva, Rui Neto, Maite Melero, David Perez, António Branco, Ruben Branco, and Luı́s Gomes in Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 351 (full proceedings pdf, pdf, bibtex, poster)
Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow Luı́s Gomes PhD Thesis, December 2017, Universidade Nova de Lisboa (pdf, bibtex)
Using Bilingual Segments in Generating Word-to-word Translations Kavitha Mahesh, Luı́s Gomes and Gabriel Pereira Lopes in Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra-6), December 2016, Osaka, Japan (pdf, bibtex)
First Steps Towards Coverage-based Document Alignment Luís Gomes and Gabriel Pereira Lopes in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany) (pdf, bibtex)
English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach José Aires, Gabriel Pereira Lopes and Luís Gomes in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany) (pdf, bibtex)
SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Luís Gomes, João António Rodrigues, Steven Neale, João Silva, Andreia Querido, Nuno Rendeiro and António Branco in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany) (pdf, bibtex)
First Steps Towards Coverage-based Sentence Alignment Luís Gomes and Gabriel Pereira Lopes in Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 25-27 May 2016, Portorož (Slovenia) (pdf, bibtex, code)
Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models Steven Neale, Luís Gomes, Eneko Agirre, Oier Lopez de Lacalle and António Branco in Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 25-27 May 2016, Portorož (Slovenia) (pdf, bibtex)
Seeking to Reproduce “Easy Domain Adaptation” Luís Gomes, Gertjan van Noord, António Branco and Steven Neale in 4REAL – Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language, collocated with LREC 2016, 28 May 2016, Portorož (Slovenia) (pdf, bibtex)
Domain-Specific Hybrid Machine Translation from English to Portuguese João Rodrigues, Luís Gomes, Steven Neale, Andreia Querido, Nuno Rendeiro, Sanja Štajner, João Silva and António Branco in PROPOR 2016 – International Conference on the Computational Processing of the Portuguese Language, 13-15 July 2016, Tomar (Portugal), Springer (pdf, bibtex)
Learning Clusters of Bilingual Suffixes using Bilingual Translation Lexicon Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Mining Intelligence and Knowledge Exploration (MIKE 2015), Hyderabad, India, December 2015, Springer (pdf, bibtex)
Bilingually motivated segmentation and generation of word translations using relatively small translation data sets Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 29), Shanghai, China, October 2015 (pdf, bibtex)
New Language Pairs in TectoMT Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel and Rudolf Rosa in Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT15), Lisboa, Portugal, September 2015, ACL (pdf, bibtex, poster)
Bootstrapping a Hybrid Deep MT System João Silva, João Rodrigues, Luís Gomes and António Branco in Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), Beijing, China, July 2015, ACL (pdf, bibtex)
Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies Sanja Štajner, João Rodrigues, Luís Gomes and António Branco in 1st Deep MT Workshop (DMTW), Prague, Czech Republic, September 2015 (pdf, bibtex)
First Steps in Using Word Senses as Contextual Features in Maxent Models for Machine Translation Steven Neale, Luís Gomes and António Branco in 1st Deep MT Workshop (DMTW), Prague, Czech Republic, September 2015 (pdf, bibtex)
Improving bilingual search performance using compact full-text indices Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo in Computational Linguistics and Intelligent Text Processing, 16th International Conference, CICLing 2015, Cairo, Egypt, April 2015, Springer (pdf, bibtex)
Selecting Translation Candidates for Parallel Corpora Alignment Kavitha Mahesh, Luís Gomes, José Aires and Gabriel Pereira Lopes in Progress in Artificial Intelligence, 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 2015, Springer (pdf, bibtex)
Identification of Bilingual Segments for Translation Generation Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Advances in Intelligent Data Analysis XIII, 13th International Symposium, IDA 2014, Leuven, Belgium, October 2014, Springer (pdf, bibtex)
Identification of Bilingual Suffix Classes for Classification and Translation Generation Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Advances in Artificial Intelligence, 14th Ibero-American Conference on AI, IBERAMIA 2014, Santiago de Chile, Chile, November 2014, Springer (pdf, bibtex)
Compact and Fast Indexes for Translation Related Tasks Jorge Costa, Luís Gomes, Gabriel Pereira Lopes, Luís Russo and Nieves Brisaboa in Progress in Artificial Intelligence, 16th Portuguese Conference in Artificial Intelligence, EPIA 2013, Açores, Portugal, September 2013, Springer (pdf, bibtex)
Measuring Spelling Similarity for Cognate Identification Luís Gomes and Gabriel Pereira Lopes in Progress in Artificial Intelligence, 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011, Springer (pdf, bibtex, code)
Using SVMs for Filtering Translation Tables for Parallel Texts Alignment Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes in Proceedings of 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011 (pdf, bibtex)
Managing and Querying a Bilingual Lexicon with Suffix Trees Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo in Proceedings of 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011 (pdf, bibtex)
Representing a Bilingual Lexicon with Suffix Trees Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo in Proceedings of 26th Symposium On Applied Computing (SAC 2011), Taichung, Taiwan, March 2011, ACM (pdf, bibtex)
Parallel Texts Alignment Luís Gomes, José Aires, and Gabriel Pereira Lopes in New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October, 2009 (pdf, bibtex)
Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures José Aires, Gabriel Pereira Lopes, and Luís Gomes in Progress in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 2009, Springer (pdf, bibtex)
Parallel Texts Alignment Luís Gomes Master Thesis, February, 2009, Universidade Nova de Lisboa (pdf, bibtex)
runseq (run sequentially) is a simple command line tool for managing a queue of long-running processes to be executed sequentially. I allows adding and removing processes to the queue without disrupting the process being executed. This tool might be useful in scenarios where slurm is too complicated.
mosestokenizer is a Python package that provides wrappers for some pre-processing Perl scripts from the Moses toolkit (tokenizer, sentence splitter and punctuation normalizer).
stringology is a Python package that implements several classical string algorithms.
dicionário terminológico (DT) is a Portuguese linguistic terminology dictionary.
More is available from my GitHub page.
Luís Gomes luismsgomes@gmail.com http://research.variancia.com/ This page is valid HTML5.