Adding New Languages to Machine Translators
Let’s have another look at the latest news in the field of machine translation. I have highlighted the stories I consider to be the most important.
Microsoft adds 13 new African languages to Microsoft Azure Cognitive Services Translator
Microsoft has added 13 new African languages to its Microsoft Azure Cognitive Services Translator (https://azure.microsoft.com/ru-ru/products/cognitive-services/translator/), enabling text and documents to be translated to and from these languages. Sesotho (Southern Sotho), Sesotho sa Leboa (Northern Sotho), Setswana (Tswana), and Xhosa are the latest of South Africa’s official languages to be supported, following last year’s release of Zulu. The other languages included are chiShona, Hausa, Igbo, Kinyarwanda, Lingala, Luganda, Nyanja, Rundi, and Yoruba. This brings the total number of supported languages to 124, and adds language support for millions of people in Africa and worldwide. Integration across Microsoft’s ecosystem includes Microsoft 365 for translating text and documents, the Microsoft Edge browser and Bing search engine for translating whole webpages, SwiftKey for translating messages, LinkedIn for translating user-submitted content, the Translator app for enabling multilingual conversations on the fly, and more. This is always very valuable from the point of view of both technological development and competition. Being able to translate to and from languages that many others cannot offers a company a very important advantage. However, it is also worth remembering that there must be a sufficient level of demand. In other words, these languages must be used by many people and there must be a sufficiently large corpus of text in these languages available, which is constantly supplemented by various people and checked by professional linguists who are enthusiastic about their work.
University of Tartu machine translation engine now supports 17 new Finno-Ugric languages
The engine can be found at: https://neurotolge.ee
Researchers at the University of Tartu Institute of Computer Science have added Livonian, Komi, Veps, and 14 other low-resource Finno-Ugric languages to Neurotõlge, the University’s machine translation engine. In most cases, this is the first time these languages have been added to a public translation engine, as they are not included in Google Translate or similar services. In total, the translation engine supports 23 Finno-Ugric languages: in addition to the more commonly supported Estonian, Finnish, and Hungarian, the full list now includes Livonian, Võro, Proper Karelian, Livvi Karelian, Ludian, Veps, Northern Sami, Southern Sami, Inari Sami, Skolt Sami, Lule Sami, Komi, Komi-Permyak, Udmurt, Hill Mari and Meadow Mari, Erzya, Moksha, Mansi, and Khanty. The research group is now inviting speakers and researchers of these languages to contribute corrected translations to improve the quality of the engine’s output. Texts like poems, articles, books, and other things in these languages are also enormously helpful and can be sent to ping@tartunlp.ai to be added to the corpus of text. The higher the quality of the text used to train the engine, the higher the quality of the machine translation output. Feedback is needed to improve translation quality, because many of these languages have extremely scarce resources that can be used to develop such translation systems. The work is being funded by the National Programme of Estonian Language Technology.
AI engines can prevent emergence of pirated manga translations
There is now an artificial intelligence-powered manga translation engine that can easily translate Japanese manga into English! The latest issue of Kizuna explains how the engine can recognize manga text, quickly translate it into English, and identify the proper order. Learn more at https://www.japan.go.jp/kizuna/2023/02/manga_translation_service.html
Japanese manga, such as Demon Slayer, One Piece, Slam Dunk, and Dragon Ball, has a huge following around the world, and endless pirated translations continue to be produced. “This system has already been adopted by more than 10 companies in Japan and overseas, where it supports the translation of 40,000 to 50,000 pages a month,” says Ishiwatari Shonosuke, co-founder and CEO of Mantra. The engine enables artificial intelligence to learn from massive amounts of data, focusing on manga graphics and translation. It has succeeded in accurately reading the location and content of the text in an image and in translating the words in colloquial form, while taking account of their order and the context. The text in the image can be translated and replaced simply by uploading the manga’s data into the system and selecting the language, with the entire process taking as little as a few seconds per page. “If we increase efficiency and release translated versions with no time lag, we can prevent pirated manga translations from emerging. One pirate translation group has already announced that they will no longer produce titles that have been translated and released through our system,” notes Ishiwatari.