The latest language news
Let’s talk once again about the latest language news. I have highlighted the news items that I consider the most important.
AI-based language translator DeepL raises over $100M at $1B+ valuation
https://techcrunch.com/2023/01/11/deepl-the-ai-based-language-translator-raises-over-100m-at-a-1b-valuationAI startups, and specifically those helping humans communicate with each other, are commanding a lot of interest from investors. Today, the latest player to join the field is announcing a big round of funding. DeepL, a startup that provides instant translation-as-a-service both to both businesses and individuals—competing with Google, Bing, and other online tools—has confirmed a fundraising round at a valuation of €1 billion (just over $1 billion at today’s rates). DeepL is not disclosing the total amount that it has raised—it does not want to focus on this aspect, CEO and founder Jaroslaw Kutylowski said in an interview—but a range of figures have been mentioned.
The startup is also not confirming or disclosing other financials, but an investor source claims that the $1 billion valuation was based on a 20x multiple of DeepL’s annual revenue: $50 million as of the end of last year. In the current fundraising climate, this is a pretty bullish multiple, but it speaks to the company’s growth, which the investor noted is currently at 100%, and the fact that DeepL is breaking even and is close to being profitable. Despite the pressure on deep learning these days—investors want returns and commercial end points—the latter of these, the moonshots, remain a priority for the company, something DeepL has been able to retain because it has been growing its core translation services. Many startups have been struggling to raise rounds, and those that have say that there has been a lot of pressure on valuations as a result, but Kutylowski said that the rising tide for AI-based language services has helped DeepL on this front. The company has long competed with the likes of Google and Microsoft on the translation front, with the smaller upstart often compared favorably to those Goliaths. Notably, neither are investors, and Kutylowski strongly declined to comment on whether either of them, or any other major tech company like Amazon, had ever approached the startup for investment, partnerships, or acquisitions.
Spotify Is Testing AI-Powered Podcast Language Translation Which Mimics the Podcaster’s Own Voice
Spotify is testing out a way for podcasters to reach listeners in different languages using artificial intelligence technology that emulates the podcaster’s own voice. As part of the pilot, Spotify worked with a “select group” of podcasters—Dax Shepard and Monica Padman (“Armchair Expert”), Lex Fridman, Steven Bartlett (“The Diary of a CEO”), and Bill Simmons of Spotify’s The Ringer—to generate AI-powered voice translations of several episodes in other languages, including Spanish, French, and German. Other shows expected to be included in Spotify’s voice translation test are Dax Shepard’s “eff won with DRS,” “The Rewatchables” from The Ringer, and Trevor Noah’s new original podcast, slated to launch later this year. Spotify’s tool, which was developed in-house, uses OpenAI’s recently released voice generation technology to match the style of the original speaker. That, according to Spotify, results in a “more authentic listening experience” that sounds “more personal and natural than traditional dubbing.” The voice-translated episodes from the creators in Spotify’s pilot will be available on the platform worldwide.
There are now options for relatively real-time translation
https://techwireasia.com/2023/09/is-realtime-translation-ready-for-mainstream-use/The possibility of real-time translation has long been a transnational daydream. We all know the feeling: sitting down to order in a nice restaurant somewhere abroad—perhaps on the last night of a vacation or the evening after an international business meeting—and watching the waiter struggle to decipher what it is that you are trying to order. The days of phrasebooks seem like a very long time ago. Sure, you may have used the Internet to look up the name of the dish you want, or even listened to a robotic pronunciation guide, but in the moment, it just does not seem to cut it. If only it were as easy as turning on subtitles!
The new Cotopat translation tool consists of a transparent screen that converts speech to text in real time, displaying bidirectional translation between two speakers. It recognizes spoken words in real time and displays the translated text and associated visuals: live subtitles. Currently, it can translate between Japanese and five languages: Simplified Chinese, Traditional Chinese, English, Portuguese, Korean, and Vietnamese. Cotopat is designed to recognize each speaker’s voice and can identify synonyms, homophones, and word boundaries—things that Google Translate can be a little rough on. Like almost all emerging technologies at the moment, it uses a pre-trained AI to translate spoken words. While Cotopat is generating buzz, it is not the only live translation service out there. App stores list several variations along the same theme, each with varying use cases and specializations. The options available include:
- Languageio – “Automatic translation software for live, text-based conversational channels.”
- Boostlingo – Allows users to translate “Anytime, anywhere, in any language.”
- Kudo – Offers live webinar translations with over “200 spoken and sign languages.”
- Stenomatic – Available in over 70 languages, Stenomatic offers “live translation and interpretation technology.”
- ModernMT – This “learns from linguists’ corrections in real time” and “improves from corrections and adapts to the context of the document. Like a human.”
Meta Open-Sources Multilingual Translation Foundation Model SeamlessM4T
https://www.infoq.com/news/2023/09/meta-seamless-translation/
Meta recently open-sourced its Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 languages. SeamlessM4T is trained on 1 million hours of audio data and outperforms the current state-of-the-art speech-to-text translation model. SeamlessM4T is a multimodal model that can handle both text and audio data as input and output, allowing it to perform automatic speech recognition, text-to-text translation, speech-to-text translation, text-to-speech translation (T2ST), and speech-to-speech translation. The model has been released under the non-commercial CC BY-NC 4.0 license. Meta is also releasing their training dataset, SeamlessAlign, which contains 270,000 hours of audio data with corresponding text transcriptions, as well as their code for mining the data from the internet. SeamlessM4T is based on the UnitY neural network architecture, which consists of a pipeline of three components. The first is an encoder that can handle both speech audio and text data input and recognize the meaning of the input. The audio subcomponent is based on w2v-BERT and the text component is based on NLLB. Next is a decoder, also based on NLLB, which converts that meaning into a textual output in the target language. Finally, there is a text-to-acoustic unit decoder to convert the target text into speech. The SeamlessM4T code and models are available on GitHub. Additionally, there is an interactive translation demo available on Huggingface.
Large Language Model Market to Grow USD 40.8 Billion by 2029 at CAGR of 21.4%
The need for natural language processing technologies across a variety of applications, including chatbots, virtual assistants, content production, translation services, and more, is the main factor driving the expansion of the large language model market. Large language models, which can comprehend and generate human-like text, are at the forefront of this trend as companies and organizations look to use them to improve customer interactions, automate processes, conduct large-scale textual data analysis, and spur innovation across a range of industries. The popularity of big language models is expected to increase as their capabilities and adaptability continue to advance, thus boosting the development of the market. The growing need for NLP applications is the primary driver of the large language model market. These programs perform a wide range of functions, including text summarization, sentiment analysis, content production, language translation, chatbots, and virtual assistants. Large language models are essential in the era of conversational AI and data-driven decision making because they are at the forefront of enabling these applications by providing the underlying capability to interpret, analyze, and synthesize human-like text.
Large language models play a critical role in content generation. These models are increasingly used by businesses to automate the creation of marketing materials, journalism, and advertising content. Large language models have become important for content-driven enterprises as a result of this automation, which not only saves time and money but also guarantees consistent and high-quality output. Strong language models that can handle and understand their input data are now necessary because of the abundance of digital data, including text-based data from social media, websites, and papers. Large language models can now be trained more easily and with greater success, resulting in more accurate and contextually appropriate responses.
Key Players: Meta, AI21 Labs, Tencent, Yandex, DeepMind, Naver, Open AI, Google, Microsoft, Amazon, Baidu, Deepmind, Anthropic, Alibaba, Huawei.