Patrik Lambert

Senior Researcher

I now work as Research Scientist at WebInterpret, mostly in the fields of machine translation and information extraction.

I previously worked at the GLiCom group of Pompeu Fabra University, implementing the CrossLingMind project (“Automated analysis of opinions in a multilingual context”) on Cross-lingual Opinion Mining, funded by the Marie Curie Intra-European Fellowship program. Then I joined the EUMSSI European project.

Previously I mostly worked on word and sentence alignment and machine translation. I was a Marie Curie fellow at Barcelona Media Innovation Center, postdoc at the Centre for Next Generation Localisation in Dublin City University and in the LST group at LIUM (Le Mans University). I completed my PhD in Artificial Intelligence at the Universitat Politècnica de Catalunya (UPC, Barcelona) in 2008. I participated in several Spanish (Aliado, Avivavoz), Irish (NGL), French (Cosmat, PEA Trad), and European (TC-STAR, EuroMatrix Plus, MateCAT) research projects.

Activities

Software

  • BIA: Discriminative Phrase Alignment Toolkit (based on log-linear alignment models).
  • Alignment Set Toolkit: library and command-line utilities to manage sets of sentence pairs aligned at a word or phrase level.

Data

Publications

2016

  • Jeremy Barnes, Patrik Lambert and Toni Badia. Exploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification. In Proc. of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka, Japan.   pdf  bib
  • Marta R. Costa-jussà, Reinhard Rapp, Patrik Lambert, Kurt Eberle, Rafael Banchs, Bogdan Babych.Hybrid Approaches to Machine Translation. Book, in the Theory and Applications of Natural Language Processing collection. Springer. ISBN/ISSN: 978-3-319-21310-1;2192-032X   link
  • Maite Melero, Marta R. Costa-jussà, Patrik Lambert and Martí Quixal. Selection of correction candidates for the normalization of Spanish user-generated content. In Natural Language Engineering, 22 (1), pp 135-16. Copyrights Cambridge University Press.    pdf   bib   link
  • Sadaf Abdul-Rauf, Holger Schwenk, Patrik Lambert and Mohammad Nawaz. Empirical use of Information Retrieval to build synthetic data for SMT domain adaptation. In IEEE/ACM Transactions on Audio, Speech and Language Processing, 24 (4), pp 745-754.    link
  • Marta R. Costa-jussà, Srinivas Bangalore, Patrik Lambert, Lluís Màrquez and Elena Montiel-Ponsoda. Introduction to the Special Issue on Cross-Language Algorithms and Applications. In Journal of Artificial Intelligence Research, 55, pp 1-15.    pdf   link

2015

  • Patrik Lambert. Aspect-Level Cross-lingual Sentiment Classification with Constrained SMT. In Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp 781-787. Beijing, China.   pdf  bib

2014

  • Patrik Lambert and Carlos Rodriguez-Penagos. Adapting Freely Available Resources to Build an Opinion Mining Pipeline in Portuguese. In Proc. of the Language Resources and Evaluation Conference (LREC). Reykjavic, Iceland.   pdf  bib

2013

  • Carlos Rodriguez-Penagos, Jordi Atserias Batalla, Joan Codina-Filbà, David García-Narbona, Jens Grivolla, Patrik Lambert and Roser Saurí. FBM: Combining lexicon-based ML and heuristics for Social Media Polarities. In Proc. of the Seventh International Workshop on Semantic Evaluation (SemEval). Atlanta, Georgia, USA.   pdf  bib
  • Marta R. Costa-jussà, Rafael Banchs, Reinhard Rapp, Patrik Lambert, Kurt Eberle and Bogdan Babych. Workshop on Hybrid Approaches to Translation: Overview and Developments. In Proc. of the Second Workshop on Hybrid Approaches to Translation (HyTra). Sofia, Bulgaria.   pdf  bib
  • Guillem Massó, Patrik Lambert, Carlos Rodríguez Penagos and Roser Saurí. Generating New LIWC Dictionaries by Triangulation. In Proc. of the Asia Information Retrieval Societies Conference (AIRS). pp 263-271. Singapore.   pdf  

2012

  • Christophe Servan, Patrik Lambert, Anthony Rousseau, Holger Schwenk and Loïc Barrault. LIUM's SMT Machine Translation Systems for WMT 2012. In Proc. of the Seventh Workshop on Statistical Machine Translation (WMT). pp. 369-373. Montréal, Canada.   pdf  bib
  • Patrik Lambert, Simon Petitrenaud, Yanjun Ma and Andy Way. What Types of Word Alignment Improve Statistical Machine Translation?. In Machine Translation journal.    link
  • Sadaf Abdul-Rauf, Mark Fishel, Patrik Lambert, Sandra Noubours and Rico Sennrich. Extrinsic Evaluation of Sentence Alignment Systems. In Proc. of the LREC Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS). pp. 6-10. Istanbul, Turkey.   pdf  slides
  • Patrik Lambert, Holger Schwenk and Frédéric Blain. Automatic Translation of Scientific Documents in the HAL Archive. In Proc. of the Eight International Conference on Language Resources and Evaluation (LREC'12). pp. 3933-3936. Istanbul, Turkey.   pdf  bib
  • Patrik Lambert, Jean Senellart, Laurent Romary, Holger Schwenk, Florian Zipser, Patrice Lopez and Frédéric Blain. Collaborative Machine Translation Service for Scientific texts. In Proc. of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). pp. 11-15. Avignon, France.   pdf  bib
  • Patrik Lambert and Rafael E. Banchs. BIA: a Discriminative Phrase Alignment Toolkit. In The Prague Bulletin of Mathematical Linguistics (PBML). pp. 43-53.    pdf  software

2011

  • Loïc Barrault and Patrik Lambert. Machine Translation System Combination with MANY for ML4HMT. In Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-2011). Barcelona, Spain.   pdf   software
  • Patrik Lambert and Rafael E. Banchs. BIA: a Discriminative Phrase Alignment Toolkit. In Sixth Machine Translation Marathon. Trento, Italy.   slides  Toolkit page
  • Patrik Lambert, Holger Schwenk, Christophe Servan, Sadaf Abdul-Rauf. Investigations on Translation Model Adaptation Using Monolingual Data. In Proc. of the Sixth Workshop on Statistical Machine Translation (WMT). pp. 284-293. Edinburgh, Scotland.   pdf  bib  slides
  • Holger Schwenk, Patrik Lambert, Loïc Barrault, Christophe Servan, Sadaf Abdul-Rauf, Haithem Afli, Kashif Shah. LIUM's SMT Machine Translation Systems for WMT 2011. In Proc. of the Sixth Workshop on Statistical Machine Translation (WMT). pp. 464-469. Edinburgh, Scotland.   pdf  bib

2010

  • Patrik Lambert, Simon Petitrenaud, Yanjun Ma and Andy Way. Statistical Analysis of Alignment Characteristics for Phrase-based Machine Translation. In Proc. Of the 14th European Association for Machine Translation (EAMT). Saint-Raphaël, France.   pdf
  • Patrik Lambert, Sadaf Abdul-Rauf and Holger Schwenk. LIUM SMT Machine Translation System for WMT 2010. In Proc. of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. pp. 121-126. Uppsala, Sweden.   pdf   bib

2009

  • Holger Schwenk, Loïc Barrault, Yannick Estève, and Patrik Lambert. LIUM's statistical machine translation systems for IWSLT 2009. In Proc. of International Workshop on Spoken Language Translation. pp. 65-70. Tokyo, Japan.   pdf
  • Patrik Lambert, Yanjun Ma, Sylwia Ozdowska and Andy Way. Tracking Relevant Alignment Characteristics for Machine Translation. In Proc. of Machine Translation Summit XII. pp. 268-275. Ottawa, Canada.   pdf
  • Yanjun Ma, Patrik Lambert and Andy Way. Tuning Syntactically Enhanced Word Alignment for Statistical Machine Translation. In Proc. of the 13th Annual Conference of the European Association for Machine Translation (EAMT). pp. 250-257. Barcelona, Spain.   pdf

2008

  • Patrik Lambert and Rafael E. Banchs. Word association models and search strategies for discriminative word alignment. In Proc. of the 12th Annual Conference of the European Association for Machine Translation (EAMT), pp 97--103. Hamburg, Germany, September.   pdf
  • Maxim Khalilov, Adolfo Hernández H., Marta R. Costa-jussà, Josep M. Crego, Carlos A. Henríquez Q., Patrik Lambert, José A. R. Fonollosa, José B. Mariño and Rafael E. Banchs. The TALP-UPC Ngram-based statistical machine translation system for ACL-WMT 2008. In Proc. of the Association for Computational Linguistics, Third Workshop on Statistical Machine Translation (ACL'08-SMT), pp. 127-130. Columbus, USA, June.   pdf
  • Patrik Lambert. Exploiting Lexical Information and Discriminative Alignment Training in Statistical Machine Translation. PhD Thesis. Software Department, Universitat Politècnica de Catalunya.   Thesis  Defense slides

2007

  • Patrik Lambert, Marta R. Costa-jussà, Josep M. Crego, Maxim Khalilov, José B. Mariño, Rafael Banchs, José A.R. Fonollosa and Holger Schwenk. The TALP Ngram-based SMT System for IWSLT 2007. In Proc. of the International Workshop on Spoken Language Translation, pp 169-174. Trento, Italy, October.   pdf
  • Marta R. Costa-jussià, Josep M. Crego, Patrik Lambert, Maxim Khalilov, José A. R. Fonollosa, José B. Mariño, Rafael E. Banchs. Ngram-Based Statistical Machine Translation Enhanced with Multiple Weighted Reordering Hypotheses. In Proc. of the Second Workshop on Statistical Machine Translation, pp 167--170. Prague, Czech Republic, June.   pdf
  • Patrik Lambert, Rafael Banchs and Josep M. Crego. Discriminative Alignment Training without Annotated Data for Machine Translation. In Proc. of the North American Chapter of the Association for Computational Linguistics, Human Language Technologies Conference (NAACL-HLT), pp 85-88. Rochester, NY, USA, April 23-25.   pdf  slides

2006

  • José B. Mariño, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa, Marta R. Costa-jussà. N-gram-based Machine Translation. Computational Linguistics, 32 (4) pp. 527-549. MIT Press.   link
  • Rafael E. Banchs, Josep M. Crego, Patrik Lambert, José B. Mariño. A Feasibility Study For Chinese-Spanish Statistical Machine Translation. Proc. of the 5th Int. Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 681-692. Kent Ridge, Singapore, December 13-16.   pdf
  • Patrik Lambert, Jesús Giménez, Marta R. Costa-jussà, Enrique Amigó, Rafael E. Banchs, Lluís Màrquez and J.A. R. Fonollosa. Machine Translation System Development Based on Human Likeness. Proc. of the IEEE/ACL Workshop on Spoken Language Technology, pp. 246-249. Palm Beach, Aruba, December 10-13.   pdf
  • Patrik Lambert and Rafael E. Banchs. Tuning Machine Translation Parameters with SPSA. Proc. of the International Workshop on Spoken Language Translation, pp. 190-196. Kyoto, Japan, November 27-28.   pdf  slides
  • Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, Marta R. Costa-jussà, José B. Mariño, Rafael E. Banchs, José A.R. Fonollosa. The TALP Ngram-based SMT System for IWSLT 2006. Proc. of the International Workshop on Spoken Language Translation, pp. 116-122. Kyoto, Japan, November 27-28.   pdf  slides
  • Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José B. Mariño, Rafael E. Banchs. TALP Phrase-Based System and TALP System Combination for IWSLT 2006. Proc. of the International Workshop on Spoken Language Translation, pp. 123-129. Kyoto, Japan, November 27-28.   pdf  slides
  • Adrià de Gispert, Deepa Gupta, Maja Popovic, Patrik Lambert, José B. Mariño, Marcello Federico, Hermann Ney and Rafael Banchs. Improving Statistical Word Alignments with Morpho-syntactic Transformations. Proc. of 5th Int. Conf. on Natural Language Processing (FinTAL), pp. 368-379. Turku (Finland), August 23-25.   pdf
  • José B. Mariño, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa, Marta R. Costa-jussà, Maxim Khalilov. UPC's bilingual n-gram translation system. Proc. of the TC-Star Workshop on Speech-to-Speech Translation, pp. 43-48. Barcelona, Spain, June 19-21.   ps
  • Maja Popovic, Adrià de Gispert, Deepa Gupta, Patrik Lambert, Hermann Ney, José B. Mariño, Marcello Federico and Rafael Banchs. Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output. Proc. of the HLT-NAACL Workshop on Statistical Machine Translation, pp. 1-6. New York City, USA, June 8-9.   pdf
  • Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José B. Mariño, José A. R. Fonollosa, Rafael Banchs. TALP Phrase-based statistical translation system for European language pairs. Proc. of the HLT-NAACL Workshop on Statistical Machine Translation, pp. 142-145. New York City, USA, June 8-9.   pdf
  • Josep M. Crego, Adrià de Gispert, Patrik Lambert, Marta R. Costa-jussà, Maxim Khalilov, Rafael Banchs, José B. Mariño, José A. R. Fonollosa. N-gram-based SMT System Enhanced with Reordering Patterns. Proc. of the HLT-NAACL Workshop on Statistical Machine Translation, pp. 162-165. New York City, USA, June 8-9.   pdf
  • Patrik Lambert and Rafael Banchs. Grouping Multi-word Expressions According to Part-Of-Speech in Statistical Machine Translation. Proc. of the EACL Workshop on Multi-Word-Expressions in a Multilingual Context, pp. 9-16. Trento, Italy, April 3rd.   pdf

2005

  • Patrik Lambert, Adrià de Gispert, Rafael Banchs and José B. Mariño. Guidelines for Word Alignment Evaluation and Manual Alignment. Language Resources and Evaluation, 39 (4) pp. 267-285. Springer.   link
  • Patrik Lambert and Rafael Banchs. Data Inferred Multi-word Expressions for Statistical Machine Translation. Proc. of Machine Translation Summit X, pp. 396-403. Phuket, Thailand.   pdf
  • José B. Marino, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa and Marta R. Costa-jussà. Bilingual N-gram Statistical Machine Translation. Proc. of Machine Translation Summit X, pp. 275-82. Phuket, Thailand.   pdf
  • José B. Marino, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa and Marta R. Costa-jussà. Modelo estocástico de traducción basado en N-gramas de tuplas bilingues y combinación log-lineal de características. Procesamiento del Lenguaje Natural, núm. 35 (SEPLN 2005), pp. 69-76. Granada, Spain.   pdf
  • Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert and José B. Marino. Statistical Machine Translation of Euparl Data by Using Bilingual N-grams. Proc. of the ACL Workshop on Building and Using Parallel Texts, pp. 133-36. Ann Arbor, Michigan, USA.   pdf

2004

  • Victoria Arranz, Núria Castell, Josep M. Crego, Jesús Giménez, Adrià de Gispert and Patrik Lambert. 2004. Bilingual Connections for Trilingual Corpora: An XML Approach. In Proc. of the LREC 2004, Lisbon, Portugal, May 26-28.   ps
  • Patrik Lambert and Núria Castell. 2004. Alignment of parallel corpora exploiting asymmetrically aligned phrases. In Proc. of the LREC 2004 Workshop on the Amazing Utility of Parallel and Comparable Corpora, pp 26-29. Lisbon, Portugal, May 25.   pdf

Links

N-II: a statistical machine translator between Spanish and Catalan