The Hungarian Competition Authority Has Achieved a Historic Commitment from Microsoft to Develop Its AI Systems Using 10 Billion Hungarian Words, Making This Data Freely Available to Other Developers
The Hungarian Competition Authority (GVH) initiated proceedings against Microsoft Ireland Operations Limited in July 2023, investigating whether the company adequately informed Hungarian users about its AI-based Bing service launched in February 2023. As a result of the investigation, Microsoft submitted a comprehensive set of commitments, with the most significant element being the creation of a database containing at least 10 billion Hungarian words. The "pre-cooked" (cleaned, deduplicated, formatted) dataset will not only be integrated into the company's AI systems but also made available to other developers. For comparison, OpenAI's ChatGPT system was trained on only 120-130 million Hungarian words, a fraction of the corpus now committed.
Microsoft's commitment also extends to organising educational programs for Hungarian civil servants, SMEs, and consumers to better understand the opportunities and risks of artificial intelligence. László Palkovics, government commissioner for artificial intelligence, emphasised to Index: The development of Hungarian-language artificial intelligence systems is not just a technological challenge but a national interest. University professor Zoltán Szűts described the decision as a cultural milestone, stating that to preserve the Hungarian language and cultural heritage, we need artificial intelligence that speaks and thinks in Hungarian. The database created due to the GVH procedure can significantly improve the accuracy and reliability of Hungarian-language AI-based applications, thereby promoting Hungary's digital sovereignty.
Sources:
1.

2.

3.
