EuroLLM-9B: Europe's New Multilingual AI Model

EuroLLM-9B: Europe's New Multilingual AI Model
Source: Freepik - wirestock, fabrikasimf

A new milestone in European artificial intelligence development is EuroLLM-9B, which debuted on December 2, 2024. The 9 billion parameter, open-source language model supports 35 languages, including all 24 official languages of the European Union. The project aims to promote the preservation of European linguistic diversity in the digital space.

The model's development mobilised exceptional computing capacity: using 400 Nvidia H100 GPUs, EuroLLM-9B was trained on 4 trillion tokens on the MareNostrum5 supercomputer. The training process took place in three phases: in the first, foundational phase, 3.6 trillion tokens ensured the model's multilingual foundations, using various sources such as web data, Wikipedia, ArXiv scientific articles, and parallel language corpora. This was followed by fine-tuning on 400 billion tokens, during which the proportion of web data was reduced, and high-quality multilingual texts received greater emphasis. In the final phase, on 40 billion tokens, the model learned exclusively from outstanding quality data to optimise its performance. The results speak for themselves: EuroLLM-9B outperforms similar European models and remains competitive against global developments such as Gemma-2-9B.

The project's significance is indicated by the €20.6 million European Union funding it received through the Digital Europe programme. The development was made possible by the collaboration of nine renowned European research institutes and universities, with the work carried out on the MareNostrum5 supercomputer, thanks to the EuroHPC extreme-scale access grant. The model achieved considerable professional success in its first week: more than 50,000 downloads were registered on the Hugging Face platform, and the research group is already working on developing a more extensive, 20 billion parameter version.

Sources:

EuroLLM-9B
A Blog post by EuroLLM Team on Hugging Face

2.

Home
EuroLLM: Open Source European Large Language Model

3.

A pioneering AI project awarded for opening Large Language Models to European languages
The Commission has awarded the prestigious Strategic Technologies for Europe Platform (STEP) Seal to the multilingual AI project OpenEuroLLM – the first Digital Europe Programme funded project to receive the seal.