On the language model learning process in Janus Worldwide
Machine translation has taken a firm position in our lives: we see results of its use all the time.
The constant improvement of machine translation technologies and systems in the global business community provides the opportunity to almost instantly translate any text or speech into the language you require, and the translation quality is often acceptable. It’s especially true for technical texts with certain requirements for structure, content, terminology and style. However, the same is much harder to achieve for texts with “blue-sky thinking”, such as marketing and advertising materials or content localization, etc. This is where engine customization – preparation and training of the machine translation language model for a specific client – comes to aid.
The standard process of language model training in Janus Worldwide consists of several steps:
- Gathering materials from the client – we are interested in translation memory bases, glossaries, etc.
- Preparation of materials for learning –base cleaning (removing duplicated segments, “broken” and untranslated segments, translated/source text mismatches, etc.). The cleaner the base, the better the final translation quality is.
- Direct training of the language model.
- Analysis of translation quality via a trained (customized) engine. There are commonly accepted global metrics of automated evaluation of machine translation quality that give an intrinsic indication of the completed translation quality on a percentage basis.
At our company, we regularly train our language models, as a well-trained machine translation engine means less time and effort expenditure, as well as lower translation costs. Another advantage of adding such engines to a project is their constant self-learning: they remember all corrections and accumulate more materials and thus constantly increase translation quality.
In the next article, we discuss how we prepared one of the latest perfect cases of preparation of a machine translation language model for one of our key clients.