GenAI textbook Part 14 Chapter 1 Basics of Prompt Engineering

Prompt Engineering: Shaping AI Performance Through Strategic Input Design

May 4, 2025

4 min read

Prompt Engineering: Shaping AI Performance Through Strategic Input Design — Source: Getty Images For Unsplash+

Prompt engineering is the science and art of crafting, testing, and optimising input queries to guide LLMs towards specific, high-quality outputs. Liu et al. (2023) describe it as an "empirical science" that combines linguistic precision with an understanding of model architectures, transforming human intent into machine action. Unlike traditional machine learning, which relies on fine-tuning with large datasets, prompt engineering manipulates input context to direct model behaviour without altering underlying weights (Brown et al. 2020). For example, a prompt might instruct an LLM to "explain quantum mechanics in simple terms for a secondary school audience," specifying tone and audience. The discipline draws from historical principles of information retrieval and human-computer interaction. Salton and McGill (Salton & McGill 1983) highlighted the importance of query formulation in information retrieval systems, a concept extended by prompt engineering into the generative realm of LLMs. Techniques such as zero-shot prompting, where no examples are provided (Radford et al. 2019), few-shot prompting, which includes a few examples (Brown et al. 2020), and chain-of-thought (CoT) prompting, which guides models through step-by-step reasoning (Wei et al. 2022), illustrate the versatility of prompt engineering in addressing diverse tasks.

Prompt engineering is grounded in the interplay between human language and machine cognition. LLMs, trained on vast datasets, encode statistical language patterns but lack human-like intentionality (Bender & Koller 2020). Prompt engineering bridges this gap by aligning outputs with human expectations through carefully structured inputs, akin to programming in natural language (Liu et al. 2023). This process reflects cognitive science principles, such as schema theory, which posits that context shapes interpretation (Bartlett 1995). Computationally, prompt engineering leverages the attention mechanisms of transformer-based models (Vaswani et al. 2017). The structure of a prompt—its syntax, semantics, and examples—conditions the model’s parameter space to generate contextually relevant outputs (Liu et al. 2023). For instance, CoT prompting enhances reasoning by creating a cognitive scaffold that guides the model through intermediate steps, significantly improving performance on complex tasks (Wei et al. 2022). This interdisciplinary foundation, combining linguistics, cognitive science, and computer science, underscores the sophistication of prompt engineering.

Prompt engineering is critical for maximising LLM performance and unlocking latent capabilities. Early interactions with LLMs showed that simple prompts often produced generic responses, but structured prompts, such as few-shot or CoT, dramatically improve output quality (Brown et al. 2020; Wei et al. 2022). Techniques like self-consistency, where multiple reasoning paths are sampled to select the most consistent answer, further enhance reliability (Wang et al. 2022). These methods demonstrate that prompt engineering is not merely about clarity but about guiding robust reasoning processes. Beyond performance, prompt engineering is a vital tool for ethical AI deployment. LLMs can inherit biases or generate harmful content, but carefully designed prompts can mitigate these risks. System prompts or constitutions that define ethical boundaries or instruct models to provide evidence-based answers reduce misinformation and inappropriate outputs (Perez et al. 2022; Reynolds & McDonell 2021). However, the fragility of prompts poses challenges, as slight variations can sometimes undermine these safeguards (Bender & Koller 2020), highlighting the need for robust prompting strategies.

Prompt engineering also democratises AI access, enabling non-experts to harness LLMs for domain-specific tasks. Educators, healthcare professionals, and other experts can encode their knowledge into prompts, creating bespoke AI tools without coding expertise. For instance, in education, prompts can generate personalised learning materials, enhancing student engagement (Liu et al. 2023). In healthcare, prompts can guide chatbots to provide patient-friendly information, improving accessibility (Perez et al. 2022). This fosters human-AI collaboration, shifting the focus from computational skills to subject-matter expertise. Economically, prompt engineering offers a cost-effective alternative to resource-intensive fine-tuning, allowing smaller organisations to leverage pre-trained models for tasks like customer service or content generation, aligning with sustainable AI development by reducing the environmental impact of model training (Liu et al. 2023).

Prompt engineering faces challenges, including the labour-intensive nature of manual prompt design and the variability of prompt effectiveness across models (Liu et al. 2023). Emerging research into automated prompt optimisation, such as techniques that use algorithms to iteratively refine prompts, offers potential solutions (Reynolds & McDonell 2021). Ethical concerns also persist, as poorly designed prompts can amplify biases or elicit harmful outputs (Bender & Koller 2020). Integrating ethical frameworks into prompt design is essential for responsible AI development.

References:

1. Bartlett, Frederic Charles. 1995. Remembering: A Study in Experimental and Social Psychology. Cambridge: Cambridge University Press. ^ Back

2. Bender, Emily M., and Alexander Koller. 2020. “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185–5198. https://doi.org/10.18653/v1/2020.acl-main.463 ^ Back

3. Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, and Sandhini Agarwal. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf ^ Back

4. Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.” ACM Computing Surveys 55 (9): 1–35. https://doi.org/10.1145/3563334 ^ Back

5. Perez, Ethan, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. 2022. “Red Teaming Language Models with Language Models.” arXiv preprint arXiv:2202.03286. https://arxiv.org/abs/2202.03286 ^ Back

6. Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.” OpenAI Technical Report. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf ^ Back

7. Reynolds, Laria, and Kyle McDonell. 2021. “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm.” In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 1–7. https://doi.org/10.1145/3411763.3450381 ^ Back

8. Salton, Gerard, and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. New York: McGraw-Hill. ^ Back

9. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. ‘Attention Is All You Need’. arXiv. doi:10.48550/ARXIV.1706.03762 – ^ Back

10. Wang, Xuezhi, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” arXiv preprint arXiv:2203.11171. https://arxiv.org/abs/2203.11171 ^ Back

11. Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” Advances in Neural Information Processing Systems 35: 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524fd7c23d61fefe3c413c9-Paper-Conference.pdf ^ Back