Saturday 5 April 2008
Machine Translation Techniques and Open Source
By Maxime Biais, Saturday 5 April 2008 at 20:24 :: NLP
Today there two main approaches to Machine Translation (MT)
- Rules based MT (used by numbers of companies working in the domain: Systran, Reverso, etc.). The only open source software I know that works with this approach is Apertium.
- Statistical based MT (used by Google and Language Weaver). Moses is an open source implementation of this approach. Also, the learning process is supported by other open source layers. (for example giza++ is an open source word aligner needed by moses to prepare the corpus).
Pros and cons of rules based machine translation
- It needs rules, dictionaries (general and contextual) and people with the know how (linguists) to write this rules and fill dictionaries.
- Translation costs (CPU and memory) are fairly low
Pros and cons of statistical based machine translation
- It needs big bilingual corpus and computer ressources to run the learning process
- The bilingual corpus have to be clean (automatic pre process and human checking)
- Translation costs are heavy
- You can translate in all pair languages you want if you got the corpus
Resources:
- Apertium wiki, great wiki about Apertium but also about other open source tools (word aligner, ...)
- OPUS, an open source parallel corpus (mixing different sources)
- My del.icio.us bookmarks on machine translation
Notes: there is other less used techniques; word to word substitution (Linguaphile, example based translation (I didn't find open source implementation of this one), of course, you can imagine mixed techniques.




