Meta’s open-source speech AI fashions help over 1,100 languages

meta mms speech recognition voice ai artificial intelligence languages

Developments in machine studying and speech recognition know-how have made info extra accessible to individuals, significantly those that depend on voice to entry info. Nonetheless, the dearth of labelled knowledge for quite a few languages poses a major problem in creating high-quality machine-learning fashions.

In response to this downside, the Meta-led Massively Multilingual Speech (MMS) mission has made exceptional strides in increasing language protection and bettering the efficiency of speech recognition and synthesis fashions.

By combining self-supervised studying strategies with a various dataset of non secular readings, the MMS mission has achieved spectacular ends in rising the ~100 languages supported by current speech recognition fashions to over 1,100 languages.

Breaking down language limitations

To handle the shortage of labelled knowledge for many languages, the MMS mission utilised non secular texts, such because the Bible, which have been translated into quite a few languages.

These translations offered publicly obtainable audio recordings of individuals studying the texts, enabling the creation of a dataset comprising readings of the New Testomony in over 1,100 languages.

By together with unlabeled recordings of different non secular readings, the mission expanded language protection to recognise over 4,000 languages.

Regardless of the dataset’s particular area and predominantly male audio system, the fashions carried out equally nicely for female and male voices. Meta additionally says it didn’t introduce any non secular bias.

Overcoming challenges via self-supervised studying

Coaching standard supervised speech recognition fashions with simply 32 hours of knowledge per language is insufficient.

To beat this limitation, the MMS mission leveraged the advantages of the wav2vec 2.0 self-supervised speech illustration studying approach.

By coaching self-supervised fashions on roughly 500,000 hours of speech knowledge throughout 1,400 languages, the mission considerably decreased the reliance on labelled knowledge.

The ensuing fashions had been then fine-tuned for particular speech duties, akin to multilingual speech recognition and language identification.

Spectacular outcomes

Analysis of the fashions educated on the MMS knowledge revealed spectacular outcomes. In a comparability with OpenAI’s Whisper, the MMS fashions exhibited half the phrase error price whereas protecting 11 occasions extra languages.

mms openai whisper word error rates
Meta’s open-source speech AI fashions help over 1,100 languages 10

Moreover, the MMS mission efficiently constructed text-to-speech programs for over 1,100 languages. Regardless of the limitation of getting comparatively few totally different audio system for a lot of languages, the speech generated by these programs exhibited top quality.

Whereas the MMS fashions have proven promising outcomes, it’s important to acknowledge their imperfections. Mistranscriptions or misinterpretations by the speech-to-text mannequin may end in offensive or inaccurate language. The MMS mission emphasises collaboration throughout the AI group to mitigate such dangers.

You may learn the MMS paper here or discover the mission on GitHub.

ai expo world 728x 90 01
Meta’s open-source speech AI fashions help over 1,100 languages 11

Wish to study extra about AI and massive knowledge from business leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The occasion is co-located with Digital Transformation Week.

Discover different upcoming enterprise know-how occasions and webinars powered by TechForge here.

  • Ryan Daws

    Ryan is a senior editor at TechForge Media with over a decade of expertise protecting the most recent know-how and interviewing main business figures. He can typically be sighted at tech conferences with a powerful espresso in a single hand and a laptop computer within the different. If it is geeky, he’s in all probability into it. Discover him on Twitter (@Gadget_Ry) or Mastodon (@[email protected])

Tags: ai, artificial intelligence, meta, meta mms, mms, speech recognition, text-to-speech, voice recognition

Source link

Leave A Comment



Our purpose is to build solutions that remove barriers preventing people from doing their best work.

Giza – 6Th Of October
(Sunday- Thursday)
(10am - 06 pm)