Meta, in collaboration with UNESCO, has announced the Language Technology Partner Program aimed at advancing AI-driven speech recognition and translation.
The initiative invites contributors to provide extensive speech recordings, written texts, and translated sentence sets in diverse languages.
The goal is to improve AI models and make them open-source, supporting the development of multilingual, inclusive language technology.
Program Objectives and Scope
The Language Technology Partner Program focuses on developing large language models that can proficiently operate in all European Union–recognized languages, along with additional languages such as Arabic, Chinese, and Hindi.
This effort is designed to address challenges in speech recognition and translation, particularly for languages with limited digital corpora.
By fostering an open-source approach, the program seeks to promote transparency in AI development and provide companies and public organizations with the ability to fine-tune these models for industry-specific needs.
European Commission Support and STEP Seal Recognition
The project has received backing from the European Commission and is the first recipient of the Strategic Technologies for Europe Platform (STEP) Seal this year.
The STEP Seal acts as a quality label under the Digital Europe Programme, enhancing the project’s visibility and positioning it to attract further investments from both public and private sectors.
Collaborative Efforts and Technical Workshops
The initiative is a collaborative effort involving a consortium of 20 research institutions, companies, and EuroHPC centers. Coordinated by Jan Hajič of Charles University in Czechia and co-led by Peter Sarlin, Co-Founder and CVP at AMD Silo AI, work on the project began on February 1.
As part of the program, partners will have access to exclusive technical workshops led by Meta’s Fundamental AI Research (FAIR) teams.
These workshops are designed to help participants leverage open-source models and develop language technologies tailored to specific linguistic requirements.
Advancements in Speech and Translation Technology
Building on previous successes such as the “No Language Left Behind” (NLLB) project, Meta has introduced the Massively Multilingual Speech (MMS) initiative.
This project supports transcription in over 1,100 languages and features zero-shot speech recognition, which enables accurate audio transcription even for languages not explicitly trained on.
In parallel, Meta has launched an open-source machine translation benchmark on the AI platform Hugging Face. This benchmark, designed by linguists, currently supports seven languages and evaluates translation model performance through carefully crafted sentences.
Partnership with the Nunavut Government
A notable early collaboration is with the government of Nunavut in Northern Canada.
This partnership aims to integrate Inuit languages such as Inuktitut and Inuinnaqtun into the AI models, reflecting the program’s commitment to supporting underserved languages and promoting linguistic diversity.
Addressing Data Transparency and Ethical Considerations
In keeping with its commitment to transparency, the OpenEuroLLM Project plans to release all documentation, training and testing code, and evaluation metrics along with the models.
Meta’s initiative emphasizes the importance of clear data sourcing practices, a critical aspect given past debates over the use of publicly available content.
The program seeks to ensure that contributors understand how their data will be used and aims to provide mechanisms for opting out when necessary.
Funding and Future Outlook
The Language Technology Partner Program is supported by an initial budget of €37.4 million, including €20.6 million from the EU’s Digital Europe Programme.
Although this funding is modest compared to investments in proprietary AI by other global players, the project is designed to advance open-source models that align with European values of data protection, inclusivity, and transparency.
The Language Technology Partner Program represents a strategic effort by Meta and UNESCO to foster multilingual AI innovations.
By supporting the development of high-performance, open-source language models, the initiative aims to improve speech recognition and translation technologies across diverse linguistic and cultural contexts.