Publications

TM and Glossary Integration: Maximum Quality for AI Translation

The Janus AI Ecosystem now supports the uploading of translation memories (TMs) and glossaries for use in the AI-powered translation process. Thanks to this new functionality, the system analyzes the text in the original language and compares it with data from the TM and glossary. In addition, all grammatical forms of words (declension, conjugation, cases) are taken into account. Therefore, if a glossary term appears in the text in any form, the system will see it. As a result, only those terms directly present in the text and those translation memory segments most similar to the sentences in the text are added to the prompt. This reduces the amount of data transferred, speeds up the translation, and makes it more accurate.

Figure 1 - Publication - TM Glossary integration

It is difficult to determine what will be more important for maximizing the quality of translation: using a TM or a glossary. This depends on numerous factors: the text type, the saturation of terms or clichés, the percentage of overlap between the original text and the TM and glossary, etc. However, research has shown that adding a TM and a glossary to a translation, both together and separately, consistently improves translation quality. This can be tracked in the graph below. Simultaneous integration of a TM and a glossary provides the greatest increase in quality, thereby reducing the time required for subsequent post-editing of the text and maintaining consistency of terminology, integrity, and style.

As mentioned above, quality gains mainly depend on how many TM segments or glossary terms are found directly in the text. For example, when translating from English to Japanese, the system showed a match between the text segment and the TM segment of 85.88%. Moreover, translation quality with the addition of TM increased from 66 to 95. The HLEPOR metric is used to evaluate translation quality. It assigns a rating based on the similarity of the machine translation to the reference translation prepared by a professional translator and approved by the client.

Figure 2 - Publication - TM Glossary integration

However, it is important to note that when working with new functionality through a CAT tool, it is necessary to consider how the CAT tool breaks the text into segments and to compose the TM in accordance with these principles. For example, memoQ splits text into segments based on end punctuation marks or line breaks. If you load a TM into an ecosystem where segments are not broken down according to the same principle, but, for example, by semantic paragraphs, the following may arise. In the example below, you can see that the original segment has a 100% TM match. However, when processing the file, memoQ split it into three separate segments and sent them to the system separately, which caused a failure in calculating text similarities.

Figure 3 - Publication - TM Glossary integration

The integration of translation memory and glossary represents a significant step for the development of machine translation, as it combines the two key resources of linguistic consistency (TM) and terminological accuracy (glossary). Maximum effectiveness is achieved by using both tools simultaneously. This ensures both improved translation quality and seamless preservation of style and terminology. However, technical synchronization with CAT tools is a key condition: mismatch of segmentation rules (for example, in memoQ) can significantly reduce the potential of even a perfectly prepared TM.

For enterprise teams, the main challenge is not simply to translate more content, but to do it predictably — with the same terminology, style and quality level across all languages.

Janus AI Ecosystem helps use AI translation within a controlled process, where TMs and glossaries support consistency from the very first draft.

Let`s discuss how to make your regular translation workflows more predictable in terms of timelines and quality? 

Machine Translation Specialist

Ksenia Dmitrieva

Ksenia has been with the Research & Development department at Janus since 2024. She graduated with a bachelor’s degree in Translation and Translation Studies from Herzen State Pedagogical University and is currently pursuing a master’s degree in Digital Linguistics. Ksenia specializes in training machine translation (MT) and LLM (AI) systems to improve translation quality and adapt solutions to the needs of specific sectors, as well as in evaluating translation quality using automatic metrics (such as BLEU, TER, and BERTScore).