GenAI textbook

GenAI textbook

Why Size Matters: The Impact of Model Scale on Performance and Capabilities in Large Language Models

A defining characteristic of LLMs is their scale, measured by the number of parameters, which has grown exponentially in recent years. Models such as GPT-3, with 175 billion parameters, and its successors have demonstrated remarkable capabilities, raising questions about the relationship between model size and performance (Brown et al. 2020)

Generative Artificial Intelligence: Just Hype or Reality?

Gartner’s 2024 Technology Hype Cycle—which outlines the dynamics of expectations surrounding technological innovations across five distinct phases—indicates that generative AI has surpassed the peak of inflated expectations. Nevertheless, the associated hype remains persistent (see figure below), and the technology continues to hold the potential to become truly

The Place of GenAI in the AI Hierarchy: From Neural Networks to Large Language Models

Generative AI relies on a specialised branch of machine learning (ML), namely deep learning (DL) algorithms, which employ neural networks to detect and exploit patterns embedded within data. By processing vast volumes of information, these algorithms are capable of synthesising existing knowledge and applying it creatively. As a result, generative

Model Evaluation and Performance Measurement: Methods for Determining Effectiveness in Language Model Creation

Creating effective large language models (LLMs) involves two critical stages: pre-training and fine-tuning. These stages enable models to progress from capturing broad linguistic knowledge to excelling in specific tasks, powering applications such as automated translation, sentiment analysis, and conversational agents. Rigorous evaluation and performance measurement ensure LLMs meet general and

Fine-tuning: Adapting General Models for Specific Tasks and Applications

The evolution of machine learning has led to the development of powerful general models, such as BERT, GPT-3, and Vision Transformers (ViT), which have transformed artificial intelligence applications across diverse domains. These models, pre-trained on extensive datasets like Common Crawl for natural language processing or ImageNet for computer vision, demonstrate

The Pre-Training Process: Principles, Methods, and Mechanisms of Language Pattern Acquisition

Pre-training underpins the capabilities of large-scale language models like BERT and GPT, enabling them to capture linguistic patterns from extensive text corpora. This process equips models with versatile language understanding, adaptable through fine-tuning for tasks such as translation or sentiment analysis. The principles, methods, and mechanisms of pre-training reveal how

The Transformer Revolution: Breakthrough in Language Modelling and Its Impact on AI Development

The Transformer architecture, unveiled by Vaswani et al. (2017), has catalysed a seismic shift in natural language processing (NLP), redefining the boundaries of language modelling and accelerating advancements in artificial intelligence (AI). By introducing a novel approach that prioritises parallel computation and attention-driven processing, the Transformer has surpassed traditional models,

The Attention Mechanism: The Key to Understanding Linguistic Relationships

The attention mechanism has fundamentally reshaped natural language processing (NLP), enabling models to capture complex linguistic relationships with unprecedented accuracy. Introduced prominently in Vaswani et al. (2017), attention allows models to focus on relevant parts of input sequences, enhancing performance in tasks like machine translation and sentiment analysis. This essay

NLP Tasks and Applications: Core Techniques and Their Impact

Natural Language Processing (NLP) encompasses a variety of tasks, each with distinct methodologies and applications, including Named Entity Recognition (NER), sentiment analysis, classification, machine translation, summarisation, and information extraction. These tasks underpin numerous real-world applications, from virtual assistants to automated content analysis. This essay explores these core NLP tasks, their