Meta’s Artificial Intelligence Model for Voice Translation “Seamless Streaming”

Meta's Artificial Intelligence Model for Voice Translation "Seamless Streaming"

Meta, Facebook’s parent company, continues its artificial intelligence work at full speed, and the latest they shared is the best and most useful recent artificial intelligence work! The name of the artificial intelligence study announced by Meta is Seamless Communication.

This study, which includes 4 models, focuses on communication solutions with artificial intelligence.

The sub-models of the mentioned study consist of the following;

  • Seamless Expressive: A model that aims to preserve the expressiveness and subtleties of interlingual speech. Translations must capture the nuances of human expression. While existing translation tools are capable of capturing the content of a conversation, they often rely on monotonous, robotic text-to-speech systems for their output. SeamlessExpressive aims to preserve the subtleties of speech; Such as pauses and speaking rate, in addition to vocal style and emotional tone.
  • Seamless Streaming: A model that can deliver speech and text translations with a delay of approximately two seconds. SeamlessStreaming is the first large-scale multilingual model to deliver translations with a latency of approximately two seconds and with nearly the same accuracy as an offline model. Built on SeamlessM4T v2, SeamlessStreaming supports automatic speech recognition and speech-to-text translation for approximately 100 input and output languages, as well as speech-to-speech translation for approximately 100 input languages and 36 output languages.
  • SeamlessM4T v2: A basic, multilingual, multitasking model that allows people to communicate effortlessly via speech and text. In August 2023, the first version of SeamlessM4T was introduced, a basic multilingual, multitasking model that delivers state-of-the-art results for translation and transcription across speech and text. The improved model built on this work was SeamlessM4T v2. This model, which forms the basis of the new SeamlessExpressive and SeamlessStreaming models, features a new architecture with an autoregressive non-text-to-text decoder that provides improved consistency between text and speech output.
  • Seamless: A model that combines the features of SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 in a single model.

The model called Seamless Streaming is the one that interests us the most. Because this model can make voice translations from a total of 100 languages to 38 languages in less than two seconds! So, when we start using this artificial intelligence technology, we will be able to communicate easily with anyone anywhere in the world, even if we do not know a foreign language.

So When Will We Start Using This Technology?

Uninterrupted communication artificial intelligence technology can currently only be used as a demo that supports 4 languages. It was not announced when, in what way and at what price the full version will be available.

You can try the artificial intelligence translation system, which is available as a demo, immediately from the link below if you know one of the 4 supported languages.

Supported languages; English, French, German, Spanish for entry and exit..

For more details;

This article has been automatically translated with Google Translate. Original Post.