The Latest Language News (July, 2023)

Let’s discuss the most important news in the field of languages.

Yandex Speech Service Now Understands Uzbek

https://turkmenportal.com/blog/63391/golosovoi-servis-yandeksa-osvoil-uzbekskii-yazyk

Yandex has expanded its SpeechKit capabilities to include speech recognition and synthesis in Uzbek. This makes Uzbek the 16th language that Yandex SpeechKit can work with. The training for Yandex SpeechKit’s Uzbek language feature was done using the voice of a real speaker. In addition to recognizing and synthesizing speech, the system can also transcribe Uzbek words. Yandex SpeechKit will be a valuable tool for call centers and the development of voice assistants.

Researchers Are Breaking Ancient Language Barriers With AI

https://decrypt.co/147176/ai-ancient-language-translation-cuneiform-akkadian

Deciphering ancient languages and texts has been a challenge for archaeologists for generations. Now, researchers are using artificial intelligence to quickly translate ancient texts and languages into English—including ancient Cuneiform and Egyptian hieroglyphs. Despite the challenges, such as the lack of a large amount of data, researchers have managed to train AI models with tens of thousands of examples. This breakthrough allows for the translation of Akkadian, the common language in the old Middle East and Mesopotamia.

Team Develops Faster, Cheaper Way to Train Large Language Models

https://techxplore.com/news/2023-07-team-faster-cheaper-large-language.html

A Stanford team has developed Sophia, a new way to optimize the pretraining of large language models (LLMs) and is twice as fast as current approaches. Applications like ChatGPT, which rely on large language models, are gaining widespread use and drawing media attention. However, only a few large tech companies dominate the LLM space due to the exorbitant cost of pretraining these models. Estimates suggest that the cost can start at $10 million and potentially reach tens or even hundreds of times that amount.

These models consist of millions, or even billions, of parameters that Liu Hong, a graduate student in computer science at Stanford University, likens to factory workers striving toward a common goal. One important property of these parameters is their curvature, which Liu considers as the maximum achievable speed they can reach as they progress toward the final goal of a pretrained LLM. In the factory analogy, curvature is similar to a factory worker’s workload. By accurately estimating the workload, an optimization program can enhance the efficiency of LLM pretraining.

PM Modi Presents Indian AI-Based Language Platform

https://www.msn.com/en-in/news/other/pm-modi-presents-indian-ai-based-language-platform-bhashini-at-sco-says-happy-to-share/ar-AA1drHtt

Artificial intelligence has immense potential, and the Indian government has unveiled Bhashini, a native AI-based language platform. Bhashini breaks language barriers by providing real-time translation, enabling people from different languages to interact digitally. PM Narendra Modi expressed his willingness to share this technology with other nations in the Shanghai Cooperation Organization (SCO). Currently, the official languages of the SCO are Mandarin and Russian, but India advocates English being added as an official language. Bhashini utilizes AI/ML and natural language processing (NLP) to develop and share open-source models and tools for Indian languages.

Machine Translation Specialist, R&D

Artem Eidel

Artem Eidel is a Machine Translation Specialist in R&D. His responsibilities include creating and maintaining MT models, evaluating MT quality, and selecting suitable MT providers for different language pairs and subjects.