The Latest Language News
Let us talk again about the latest language news. I have highlighted the news that I consider the most important.
Echo viewers will be able to watch new Marvel series dubbed in the Choctaw language
https://ew.com/disney-made-choctaw-language-dub-echo-marvel-series-8421456
The upcoming Marvel miniseries stars Alaqua Cox as the mercenary Maya Lopez. Since Maya is a Native American character, director Sydney Freeland and her team consulted with the Choctaw Nation about the history, costumes, and characters. But the collaboration went even further than that. EW can exclusively reveal that the Echo team has produced a full audio dub of the series in the Choctaw language.
In the exclusive video (linked), you can watch Freeland and consultant Terry Billy work on recording the dub. You can also see some footage from Echo that teases just how much the show draws from Choctaw history and culture. Audio dubs in various languages are common for films released internationally. Hayao Miyazaki’s new animated film The Boy and the Heron, for example, is widely available in both its original Japanese and an English language version. The Choctaw language is not as widely spoken as either of those, which made the Choctaw dub challenging — but also extremely rewarding.
Microsoft’s small language model
https://www.infoworld.com/article/3712200/inside-phi-2-microsoft-s-small-language-model.html
Small language models offer several advantages. It is far easier to make them portable, we cannot always be connected to the cloud, and, at the same time, we might not want to train a model on public data. It takes months to train a GPT-class LLM using a supercomputer. Building language models on smaller sets of private or domain-specific data makes it possible to deliver models that are both smaller and more specialized. Microsoft Research has used an approach it calls “textbooks are all you need” to train its Phi series of SLMs. The idea is to strategically train the model using authoritative sources to deliver responses in a clear and concise fashion. For the latest release, Phi 2, Microsoft’s training data mixed synthetic content and web-crawled information. Synthetic data is used to give the model foundational knowledge to support basic reasoning as well as a grounding in general knowledge, so the outputs are not limited to textbook-grade data and can respond to the user’s context more effectively. The results speak for themselves: Phi 2 has benchmarked as well as, and sometimes better than, models that are larger and considerably more complex.
Microsoft Research notes that the quality of the training data used is key to delivering good results and exhibiting the type of behavior seen in much larger models. Instead of training the model on a large corpus of web data, which is inherently random, the team building the Phi models curates its training data with a focus on the quality of the content. The team has also used existing knowledge from earlier Phi models to kickstart Phi 2, speeding up training. Phi models receive no human feedback-driven reinforcement learning, as the curation of the training data makes this reinforcement learning unnecessary. It also makes the model less likely to deliver toxic or biased outputs.
One key advantage is that the size and resource requirements of SLMs make them economically attractive for tasks that would be too costly to perform with LLMs. Using SLMs like Phi in common workflows, such as quickly delivering readable and comprehensible summaries of key data, could prove quite useful. A team of SLMs like Phi, each powering an intelligent agent and providing an interface between us and a sea of unstructured data, could be one way of delivering the context-based, adaptive computing environment envisioned by early ubiquitous computing researchers.
Chinese military lab AI connects to commercial large language models for the first time to learn more about humans
Chinese scientists are teaching an experimental military artificial intelligence more about facing unpredictable human enemies with the help of technologies similar to ChatGPT. A research laboratory with the People’s Liberation Army’s Strategic Support Force, which oversees space, cyber, intelligence, and electronic warfare for the Chinese military, has tested its AI system on Baidu’s Ernie and iFlyTek’s Spark, which are large language models like ChatGPT. The military AI can convert vast amounts of sensor data and information reported by frontline units into descriptive language or images and relay them to the commercial models. After they confirm they understand, the military AI automatically generates prompts for deeper exchange on various tasks such as combat simulations. This is the first time the Chinese military has publicly confirmed its use of commercial large language models. For security reasons, military information facilities are generally not directly connected to civilian networks. The team does not give details of the link between the two systems in the paper, but they stress that this work was preliminary and for research purposes. Most existing military AI is based on traditional war gaming systems. Although their capabilities have progressed rapidly, to users, they often feel more like machines than living beings. Commercial large language models may help military AI gain a deeper understanding of people.
The team also said there were still some issues in the communication between the military and commercial models, as the latter were not specifically developed for warfare. The team experimented with multi-modal communication methods. The military AI creates a detailed military map, which is then given to iFlyTek’s Spark for deeper analysis. The researchers have found that this illustrative approach significantly improves the performance of the large language models, enabling them to produce analysis reports and predictions that meet the requirements for practical application. The team acknowledge in the paper that what they have disclosed is only the tip of the iceberg of this ambitious project.
China is not the only country conducting such research. Many generals from various US military branches have publicly expressed interest in ChatGPT and similar technologies and tasked corresponding military research institutions and defense contractors to explore the possible applications of generative AI in US military operations, such as intelligence analysis, psychological warfare, drone control, and the decryption of encoded communications. But a Beijing-based computer scientist has warned that while the military application of AI is inevitable, it warrants extreme caution.
Is Google’s Gemini good at machine translation?
https://slator.com/is-google-gemini-good-at-machine-translation/
Syeda Nahida Akter, Zichun Yu, Aashiq Muhamed, Tianyue Ou, Alex Bäuerle, Ángel Alexander Cabrera, Krish Dholakia, Chenyan Xiong, and Graham Neubig from Carnegie Mellon University and BerriAI have explored the translation abilities of Google’s Gemini, highlighting it as a “valuable tool.” The researchers explain that the recently introduced Google Gemini models are the first to comprehensively report results rivaling OpenAI’s GPT series on diverse tasks. However, there is a significant drawback: the absence of released evaluation details and model predictions. To address this, the researchers have conducted a “third-party, objective comparison” between OpenAI’s GPT and Google’s Gemini models, providing “reproducible code and fully transparent results.” In addition to translation, the evaluation includes other tasks, such as reasoning, knowledge-based question answering, math problem solving, code generation, and instruction following. The researchers compared Gemini Pro, GPT-3.5 Turbo, and GPT-4 Turbo against established systems like Google Translate and benchmarked them against NLLB-MoE, an open-source machine translation model known for its extensive language coverage.
These models were evaluated across 20 languages with various levels of resource availability and translation difficulty, looking particularly at how well the models performed with translations from English into other languages. To evaluate the outputs, the researchers use standard metrics, such as BLEU and chrF2++. While Google Translate outperformed other models, excelling in 10 languages, the language models demonstrated competitive performance but fell short in translation into non-English languages. GPT-4 Turbo showed deviations in performance compared to GPT-3.5 Turbo and Gemini Pro. Notably, GPT-4 Turbo demonstrated larger improvements for low-resource languages, whereas the performance of the large language models was similar for high-resource languages. Gemini Pro outperformed both GPT-3.5 Turbo and GPT-4 Turbo in five out of 20 languages, achieving top performance in three languages. However, it exhibited a tendency to block responses in scenarios of lower confidence in approximately 10 language pairs. Gemini Pro marginally outperformed GPT-3.5 Turbo and GPT-4 Turbo in unblocked samples, where it demonstrated higher confidence. Specifically, it surpassed GPT-4 Turbo by 1.6 chrF in 5-shot and 2.6 chrF in 0-shot settings and exceeded GPT-3.5 Turbo by 2.7 chrF and 2 chrF in 5-shot and 0-shot settings, respectively. The code and data can be found at https://github.com/neulab/gemini-benchmark