Latest Language News
Let’s delve into the latest news in the world of languages. I have highlighted what I believe to be the most significant updates.
Microsoft Translator Now Supports 20 Indian Languages
Microsoft India has announced the addition of four new languages: Bhojpuri, Bodo, Dogri, and Kashmiri, to Microsoft Translator. This new release brings the total number of Indian languages supported by Microsoft Translator to 20, including Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Marathi, Nepali, Odia, Punjabi, Sindhi, Tamil, Telugu, and Urdu. This brings Microsoft Translator one step closer to its goal of supporting all 22 official Indian languages, now covering languages spoken by nearly 95% of the country’s population. This advancement will open up new economic opportunities for local artisans and businesses by allowing them to connect with a wider audience. More importantly, this expansion will help preserve indigenous knowledge and cultural identity by bridging the gap to the mainstream.
The translation feature can be accessed through the Microsoft Translator app, Edge browser, Office 365, Bing Translator, and the Azure AI Translator API for businesses and developers. This API is being used by companies like Jio Haptik and Koo. With Azure AI Translator, users can translate between the newly introduced languages and more than 135 other languages for their apps, websites, workflows, and tools. Businesses can also take advantage of multi-language support for translating e-content, e-commerce product catalogs, product documentation, and internal communications, among others. This new update will impact close to 61 million people. Bhojpuri is spoken by roughly 51 million people in eastern Uttar Pradesh, Bihar, and Jharkhand. Bodo is spoken by approximately 1.4 million people in the states of Assam and Meghalaya and neighboring Bangladesh. Dogri is spoken by 1.6 million people in Jammu and Kashmir, Himachal Pradesh and Punjab. Kashmiri is spoken by approximately 7 million people in Jammu and Kashmir and parts of neighboring Pakistan. Access to technology-based solutions across language barriers drives democratic empowerment.
Research Suggests AI Will Become a Translator for Patients After Laryngectomy
https://medicalxpress.com/news/2023-10-ai-patients-laryngectomy.html
The most common type of laryngectomy, a surgical procedure to remove advanced laryngeal cancer, dramatically alters the patient’s voice and can significantly disrupt their normal life. To improve the quality of life for patients post-laryngectomy, a team of researchers from Lithuania conducted a study using artificial intelligence to “clean” the speech of laryngectomy patients.
According to the researchers, laryngeal cancer patients often have to undergo extensive surgery to partially or completely remove their larynx. After such an operation, the patient’s vocal apparatus remains damaged. Voice function is severely impaired or completely absent, while breathing is done through a tracheostomy, an opening in the neck. In these patients, the voice produced using the remaining anatomical structures, which are not naturally designed to generate voice, is called a substitute voice.
The primary objective of this research is to develop artificial intelligence-based algorithms for the automatic enhancement and evaluation of substitute voice in patients following laryngeal cancer surgery. The developed algorithms are currently undergoing clinical trials at the largest Lithuanian university hospital—Lithuanian University of Health Sciences Hospital, specifically at the Ear, Nose, and Throat Clinic. Voice alterations post-laryngectomy vary greatly depending on the severity of the case—some individuals may only experience minor changes in their voice, while others may sound robotic or raspy. Consequently, understanding what the patient is trying to communicate is not always easy. The algorithm’s speech processing involves calculating the spectrogram of the distorted voice, extracting frequency statistics, assessing noise sensitivity, noise production, and other parameters that help “clean” the voice.
How SignBank+ Enhances Multilingual Sign-to-Spoken Language Machine Translation
https://slator.com/how-signbank-improves-multilingual-sign-to-spoken-language-machine-translation/
Researchers at Bar-Ilan University and the University of Zurich conducted a study using the SignBank dataset, a multilingual, multidomain sign language dataset, and developed an enhanced version of the dataset, called SignBank+. The researchers aimed to streamline the translation process and enhance model training and implementation. While their earlier research centered on machine translation between signed and spoken languages in both directions, their subsequent work focused on sign-to-speech MT. They achieved this by employing SignBank+ and SignWriting as an intermediary step to generate text translation.
The researchers compiled and annotated fingerspelling for letters and numbers, eliminated inconsistencies and errors from the original dataset, and expanded it by incorporating variations to multiple terms using a sample of 22 sign languages.
The dataset cleaning process involved the use of ChatGPT. For this stage, they devised a “pseudo function” that determined the number of signs, language code, and existing terms, and returned a cleaned, parallel version of the terms. They validated this method by employing the gpt-3.5-turbo-0613 model on manually cleaned samples, and comparing the results to the cleaned dataset.
To assess the quality of their dataset cleaning and expansion work, they examined its impact on multilingual MT. To do so, they trained MT models. For their test set, the researchers used manually annotated data, including tags to identify the source and target languages. Testing conditions involved using various test frameworks and pre-trained, non-optimized models, as well as multilingual translation scenarios, using the first 3000 entries.