AIREVOLUTION

Benchmark-based Evaluation of Language Models and Their Limits

Benchmarking is the practice of evaluating artificial intelligence models on a standard suite of tasks under controlled conditions. In the context of large language models (LLMs), benchmarks provide a common yardstick for measuring capabilities such as factual knowledge, reasoning, and conversational coherence. They emerged because the proliferation of new models

by Miklós Sebők - Rebeka Kiss • Feb 18, 2025

GenAI textbook Part 5 Chapter 3

Principles and Methods of Model Evaluation

Creating effective large language models (LLMs) involves two critical stages: pre-training and fine-tuning. These stages enable models to progress from capturing broad linguistic knowledge to excelling in specific tasks, powering applications such as automated translation, sentiment analysis, and conversational agents. Rigorous evaluation and performance measurement ensure LLMs meet general and

by Miklós Sebők - Rebeka Kiss • Feb 16, 2025

GenAI textbook Part 5 Chapter 2

Fine-tuning: Adapting General Models for Specific Tasks and Applications

The evolution of machine learning has led to the development of powerful general models, such as BERT, GPT-3, and Vision Transformers, which have transformed artificial intelligence applications across diverse domains. These models, pre-trained on extensive datasets like Common Crawl for natural language processing or ImageNet for computer vision, demonstrate exceptional

by Miklós Sebők - Rebeka Kiss • Feb 12, 2025

GenAI textbook Part 5 Chapter 1

The Pre-Training Process: Principles, Methods, and Mechanisms of Language Pattern Acquisition

Pre-training underpins the capabilities of large-scale language models like BERT and GPT, enabling them to capture linguistic patterns from extensive text corpora. This process equips models with versatile language understanding and adaptability through fine-tuning for tasks such as translation or sentiment analysis. The principles, methods, and mechanisms of pre-training reveal

by Miklós Sebők - Rebeka Kiss • Feb 8, 2025