Microsoft Develops AI Systems With 10 Billion Hungarian Words and Freely Shares Data Following Competition Authority Case

Microsoft Develops AI Systems With 10 Billion Hungarian Words and Freely Shares Data Following Competition Authority Case
Source: Tyler Lahti, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

The Hungarian Competition Authority Has Achieved a Historic Commitment from Microsoft to Develop Its AI Systems Using 10 Billion Hungarian Words, Making This Data Freely Available to Other Developers

The Hungarian Competition Authority (GVH) initiated proceedings against Microsoft Ireland Operations Limited in July 2023, investigating whether the company adequately informed Hungarian users about its AI-based Bing service launched in February 2023. As a result of the investigation, Microsoft submitted a comprehensive set of commitments, with the most significant element being the creation of a database containing at least 10 billion Hungarian words. The "pre-cooked" (cleaned, deduplicated, formatted) dataset will not only be integrated into the company's AI systems but also made available to other developers. For comparison, OpenAI's ChatGPT system was trained on only 120-130 million Hungarian words, a fraction of the corpus now committed.

Microsoft's commitment also extends to organising educational programs for Hungarian civil servants, SMEs, and consumers to better understand the opportunities and risks of artificial intelligence. László Palkovics, government commissioner for artificial intelligence, emphasised to Index: The development of Hungarian-language artificial intelligence systems is not just a technological challenge but a national interest. University professor Zoltán Szűts described the decision as a cultural milestone, stating that to preserve the Hungarian language and cultural heritage, we need artificial intelligence that speaks and thinks in Hungarian. The database created due to the GVH procedure can significantly improve the accuracy and reliability of Hungarian-language AI-based applications, thereby promoting Hungary's digital sovereignty.

Sources:

1.

Magyarul tanítja a Microsoft a mesterséges intelligenciát a GVH eljárása miatt – Jogászvilág
A globális technológiai óriásvállalat 10 milliárdnyi magyar szó felhasználásával fejleszti saját MI alapú rendszereit, és az adatokat más fejlesztők számára is ingyenesen hozzáférhetővé teszi.

2.

Váratlan magyar AI-siker, ami mindent megváltoztat
Palkovics László szerint ami technológiai kihívást jelent, az most nemzeti érdek is egyben.

3.

Kitűnőre vizsgázna magyarból a Microsoft
Noha nem feltétlenül ez volt az eredeti cél, jelentős mértékben javulhat a magyar nyelvű találatok nyelvi pontossága a Microsoft AI-alapú rendszereiben egy versenyhatósági vizsgálat folyományaként.