Luís Gomes' page

"Hello World!" myself

I am a Researcher at the Natural Language and Speech Group (NLX) of the Department of Informatics of the University of Lisbon, Faculty of Sciences, and the CTO of the PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language.

pointers = {
    "scholar": "http://scholar.google.com/citations?user=wiQsf7MAAAAJ",
    "github": "https://github.com/luismsgomes",
    "email": "luismsgomes@gmail.com",
}

Publications

2024

Open Sentence Embeddings for Portuguese with the Serafim PT* encoders family (accepted)
Luı́s Gomes, António Branco, João Ricardo Silva, João Rodrigues and Rodrigo Santos
(to appear) in Proceedings of the 23th Edition of the EPIA Conference

Fostering the Ecosystem of Open Neural Encoders for Portuguese with Albertina PT* Family
Rodrigo Santos, João Rodrigues, Luı́s Gomes, João Ricardo Silva, António Branco, Henrique Lopes Cardoso, Tomás Freitas Osório, Bernardo Leite in Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
(pdf, bibtex)

Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos, João Ricardo Silva, Luı́s Gomes, João Rodrigues, António Branco
in Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
(pdf, bibtex)

2023

Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*
João Rodrigues, Luı́s Gomes, João Silva, António Branco, Rodrigo Santos, Henrique Lopes Cardoso, and Tomás Osório
in Progress in Artificial Intelligence. Cham: Springer Nature Switzerland, 2023, pp. 441–453
(pdf, preprint pdf, bibtex)

2022

Open and Inclusive Language Processing: Language Processing Services by PORTULAN to Meet the Widest Needs of CLARIN users
Luís Gomes, Ruben Branco, João Silva, and António Branco
in CLARIN: The Infrastructure for Language Resources Berlin, Boston: De Gruyter, 2022.
(pdf, bibtex)

Where do I Belong in Six Centuries of Literature?
João Silva , Sara Grilo , Márcia Bolrinha , Rodrigo Santos , Luís Gomes , António Branco and Rui Vaz
in CLARIN: The Infrastructure for Language Resources Berlin, Boston: De Gruyter, 2022.
(pdf, bibtex)

Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support
António Branco, João Ricardo Silva, Luís Gomes, João António Rodrigues
in Proceedings of The 13th Language Resources and Evaluation Conference
(pdf, bibtex)

2020

A Shared Task of a New, Collaborative Type to foster Reproducibility: A first exercise in the area of language science and technology with REPROLANG2020
António Branco, Nicoletta Calzolari, Piek Vossen, Gertjan van Noord, Dieter van Uytvanck, João Silva, Luís Gomes, André Moreira, Willem Elbers
in Proceedings of The 12th Language Resources and Evaluation Conference
(pdf, bibtex)

Infrastructure for the Science and Technology of Language PORTULAN CLARIN
António Branco, Amália Mendes, Paulo Quaresma, Luís Gomes, João Silva, Andrea Teixeira
in Proceedings of the 1st International Workshop on Language Technology Platforms
(pdf, bibtex)

ELRI: A Decentralised Network of National Relay Stations to Collect, Prepare and Share Language Resources
Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Martínez Garcia, José Luis Fonseca, Patricia Fonseca, Paulo Vale, Jane Dunne, Federico Gaspari, Teresa Lynn, Helen McHugh, Andy Way, Victoria Arranz, Khalid Choukri, Hervé Pusset, Alexandre Sicard, Rui Neto, Maite Melero, David Perez, António Branco, Ruben Branco, Luís Gomes
in Proceedings of the 1st International Workshop on Language Technology Platforms
(pdf, bibtex)

2018

Exploring the Relevance of Bilingual Morph-units in Automatic Induction of Translation Templates
Kavitha Mahesh, Luı́s Gomes, and Gabriel Pereira Lopes
in Advances in Artificial Intelligence — IBERAMIA 2018, 13–16 November 2018, Trujillo, Peru
(pdf, bibtex)

Setting up the PORTULAN / CLARIN repository
Luı́s Gomes, Frederico Apolónia, Ruben Branco, João Silva and António Branco
in Proceedings of CLARIN Annual Conference 2018, 8-10 October 2018, Pisa, Italy
(full proceedings pdf, pdf, bibtex, poster)

ELRI - European Language Resource Infrastructure
Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Martı́nez Garcia, Paulo Vale, José Luis Fonseca, Teresa Lynn, Jane Dunne, Federico Gaspari, Andy Way, Victoria Arranz, Khalid Choukri, Vladimir Popescu, Pedro Neiva, Rui Neto, Maite Melero, David Perez, António Branco, Ruben Branco, and Luı́s Gomes
in Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 351
(full proceedings pdf, pdf, bibtex, poster)

2017

Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow
Luı́s Gomes
PhD Thesis, December 2017, Universidade Nova de Lisboa
(pdf, bibtex)

2016

Using Bilingual Segments in Generating Word-to-word Translations
Kavitha Mahesh, Luı́s Gomes and Gabriel Pereira Lopes
in Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra-6), December 2016, Osaka, Japan
(pdf, bibtex)

First Steps Towards Coverage-based Document Alignment
Luís Gomes and Gabriel Pereira Lopes
in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany)
(pdf, bibtex)

English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach
José Aires, Gabriel Pereira Lopes and Luís Gomes
in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany)
(pdf, bibtex)

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task
Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Luís Gomes, João António Rodrigues, Steven Neale, João Silva, Andreia Querido, Nuno Rendeiro and António Branco
in Proceedings of the First Conference on Machine Translation (WMT16), 11-12 August 2016, Berlin (Germany)
(pdf, bibtex)

First Steps Towards Coverage-based Sentence Alignment
Luís Gomes and Gabriel Pereira Lopes
in Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 25-27 May 2016, Portorož (Slovenia)
(pdf, bibtex, code)

Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models
Steven Neale, Luís Gomes, Eneko Agirre, Oier Lopez de Lacalle and António Branco
in Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 25-27 May 2016, Portorož (Slovenia)
(pdf, bibtex)

Seeking to Reproduce “Easy Domain Adaptation”
Luís Gomes, Gertjan van Noord, António Branco and Steven Neale
in 4REAL – Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language, collocated with LREC 2016, 28 May 2016, Portorož (Slovenia)
(pdf, bibtex)

Domain-Specific Hybrid Machine Translation from English to Portuguese
João Rodrigues, Luís Gomes, Steven Neale, Andreia Querido, Nuno Rendeiro, Sanja Štajner, João Silva and António Branco
in PROPOR 2016 – International Conference on the Computational Processing of the Portuguese Language, 13-15 July 2016, Tomar (Portugal), Springer
(pdf, bibtex)


2015

Learning Clusters of Bilingual Suffixes using Bilingual Translation Lexicon
Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes
in Mining Intelligence and Knowledge Exploration (MIKE 2015), Hyderabad, India, December 2015, Springer
(pdf, bibtex)

Bilingually motivated segmentation and generation of word translations using relatively small translation data sets
Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes
in Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 29), Shanghai, China, October 2015
(pdf, bibtex)

New Language Pairs in TectoMT
Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel and Rudolf Rosa
in Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT15), Lisboa, Portugal, September 2015, ACL
(pdf, bibtex, poster)

Bootstrapping a Hybrid Deep MT System
João Silva, João Rodrigues, Luís Gomes and António Branco
in Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), Beijing, China, July 2015, ACL
(pdf, bibtex)

Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies
Sanja Štajner, João Rodrigues, Luís Gomes and António Branco
in 1st Deep MT Workshop (DMTW), Prague, Czech Republic, September 2015
(pdf, bibtex)

First Steps in Using Word Senses as Contextual Features in Maxent Models for Machine Translation
Steven Neale, Luís Gomes and António Branco
in 1st Deep MT Workshop (DMTW), Prague, Czech Republic, September 2015
(pdf, bibtex)

Improving bilingual search performance using compact full-text indices
Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo
in Computational Linguistics and Intelligent Text Processing, 16th International Conference, CICLing 2015, Cairo, Egypt, April 2015, Springer
(pdf, bibtex)

Selecting Translation Candidates for Parallel Corpora Alignment
Kavitha Mahesh, Luís Gomes, José Aires and Gabriel Pereira Lopes
in Progress in Artificial Intelligence, 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 2015, Springer
(pdf, bibtex)


2014

Identification of Bilingual Segments for Translation Generation
Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes
in Advances in Intelligent Data Analysis XIII, 13th International Symposium, IDA 2014, Leuven, Belgium, October 2014, Springer
(pdf, bibtex)

Identification of Bilingual Suffix Classes for Classification and Translation Generation
Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes
in Advances in Artificial Intelligence, 14th Ibero-American Conference on AI, IBERAMIA 2014, Santiago de Chile, Chile, November 2014, Springer
(pdf, bibtex)


2013

Compact and Fast Indexes for Translation Related Tasks
Jorge Costa, Luís Gomes, Gabriel Pereira Lopes, Luís Russo and Nieves Brisaboa
in Progress in Artificial Intelligence, 16th Portuguese Conference in Artificial Intelligence, EPIA 2013, Açores, Portugal, September 2013, Springer
(pdf, bibtex)


2011

Measuring Spelling Similarity for Cognate Identification
Luís Gomes and Gabriel Pereira Lopes
in Progress in Artificial Intelligence, 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011, Springer
(pdf, bibtex, code)

Using SVMs for Filtering Translation Tables for Parallel Texts Alignment
Kavitha Mahesh, Luís Gomes and Gabriel Pereira Lopes
in Proceedings of 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011
(pdf, bibtex)

Managing and Querying a Bilingual Lexicon with Suffix Trees
Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo
in Proceedings of 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011
(pdf, bibtex)

Representing a Bilingual Lexicon with Suffix Trees
Jorge Costa, Luís Gomes, Gabriel Pereira Lopes and Luís Russo
in Proceedings of 26th Symposium On Applied Computing (SAC 2011), Taichung, Taiwan, March 2011, ACM
(pdf, bibtex)


2009

Parallel Texts Alignment
Luís Gomes, José Aires, and Gabriel Pereira Lopes
in New Trends in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October, 2009
(pdf, bibtex)

Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures
José Aires, Gabriel Pereira Lopes, and Luís Gomes
in Progress in Artificial Intelligence, 14th Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 2009, Springer
(pdf, bibtex)

Parallel Texts Alignment
Luís Gomes
Master Thesis, February, 2009, Universidade Nova de Lisboa
(pdf, bibtex)

Software

runseq (run sequentially) is a simple command line tool for managing a queue of long-running processes to be executed sequentially. I allows adding and removing processes to the queue without disrupting the process being executed. This tool might be useful in scenarios where slurm is too complicated.

mosestokenizer is a Python package that provides wrappers for some pre-processing Perl scripts from the Moses toolkit (tokenizer, sentence splitter and punctuation normalizer).

stringology is a Python package that implements several classical string algorithms.

dicionário terminológico (DT) is a Portuguese linguistic terminology dictionary.

More is available from my GitHub page.