TM and Glossary Integration: Maximum Quality for AI Translation
The Janus AI Ecosystem now supports the uploading of translation memories (TMs) and glossaries for use in the AI-powered translation process. Thanks to this new functionality, the system analyzes the text in the original language and compares it with data from the TM and glossary. In addition, all grammatical forms of words (declension, conjugation, cases) are taken into account. Therefore, if a glossary term appears in the text in any form, the system will see it. As a result, only those terms directly present in the text and those translation memory segments most similar to the sentences in the text are added to the prompt. This reduces the amount of data transferred, speeds up the translation, and makes it more accurate.

It is difficult to determine what will be more important for maximizing the quality of translation: using a TM or a glossary. This depends on numerous factors: the text type, the saturation of terms or clichés, the percentage of overlap between the original text and the TM and glossary, etc. However, research has shown that adding a TM and a glossary to a translation, both together and separately, consistently improves translation quality. This can be tracked in the graph below. Simultaneous integration of a TM and a glossary provides the greatest increase in quality, thereby reducing the time required for subsequent post-editing of the text and maintaining consistency of terminology, integrity, and style.
As mentioned above, quality gains mainly depend on how many TM segments or glossary terms are found directly in the text. For example, when translating from English to Japanese, the system showed a match between the text segment and the TM segment of 85.88%. Moreover, translation quality with the addition of TM increased from 66 to 95. The HLEPOR metric is used to evaluate translation quality. It assigns a rating based on the similarity of the machine translation to the reference translation prepared by a professional translator and approved by the client.

However, it is important to note that when working with new functionality through a CAT tool, it is necessary to consider how the CAT tool breaks the text into segments and to compose the TM in accordance with these principles. For example, memoQ splits text into segments based on end punctuation marks or line breaks. If you load a TM into an ecosystem where segments are not broken down according to the same principle, but, for example, by semantic paragraphs, the following may arise. In the example below, you can see that the original segment has a 100% TM match. However, when processing the file, memoQ split it into three separate segments and sent them to the system separately, which caused a failure in calculating text similarities.

The integration of translation memory and glossary represents a significant step for the development of machine translation, as it combines the two key resources of linguistic consistency (TM) and terminological accuracy (glossary). Maximum effectiveness is achieved by using both tools simultaneously. This ensures both improved translation quality and seamless preservation of style and terminology. However, technical synchronization with CAT tools is a key condition: mismatch of segmentation rules (for example, in memoQ) can significantly reduce the potential of even a perfectly prepared TM.
For enterprise teams, the main challenge is not simply to translate more content, but to do it predictably — with the same terminology, style and quality level across all languages.
Janus AI Ecosystem helps use AI translation within a controlled process, where TMs and glossaries support consistency from the very first draft.
Let`s discuss how to make your regular translation workflows more predictable in terms of timelines and quality?