Generative artificial intelligence (generative AI or GenAI) tools represent an emerging branch of artificial intelligence (AI) algorithms (Tan et al. 2024, 168). In contrast to traditional AI models, which typically categorise data or make predictions based on learned patterns (Turchi et al. 2023, 35), generative AI extends beyond the mere analysis of existing data to produce novel, synthetic content in response to user prompts (Feuerriegel et al. 2024, 112). As described by Ian Goodfellow—the creator of generative adversarial networks (GANs) and a pioneering figure in machine learning—generative models are among the most powerful tools of machine creativity, enabling machines to move beyond what they have previously encountered and to create something entirely new (Bordas et al. 2024, 427).
Generative AI is a scientific field concerned with the automated construction of intelligence (Van Der Zant, Kouw, & Schomaker 2013, 113). It encompasses a class of models designed to learn the underlying distribution of a training dataset and to generate new data points that follow the same distribution (Goodfellow et al. 2014, 139). Based on learned patterns, generative AI enables flexible content creation across various modalities—such as text, audio, image, video, or code—making it adaptable to a wide range of tasks and applications.
Previously, such models were typically limited to a single modality (unimodal), meaning they were designed to process and generate only one type of content. A prominent example is OpenAI’s GPT-3, which operated exclusively on textual inputs and generated text-based outputs. However, recent advancements have given rise to multimodal models, capable of processing and generating multiple types of content across different formats simultaneously (Banh & Strobel 2023, 7). One such example is the multimodal version of OpenAI’s GPT-4, which can interpret both textual and visual data and generate outputs by combining these modalities. The functioning of unimodal and multimodal models is illustrated in the figure below.
Although less common, there are also cross-modal models that specialise in transforming data across different modalities (Zhang et al. 2021). These models enable, for example, the generation of images from text—as demonstrated by DALL·E—or the generation of textual descriptions from images, which is one of the key capabilities of the CLIP model. Such models play a particularly important role in areas such as Visual Question Answering (VQA), text-to-image generation, and multimodal information retrieval.
References:
1. Banh, Leonardo, and Gero Strobel. 2023. ‘Generative Artificial Intelligence’. Electronic Markets 33 (1): 63. doi:10.1007/s12525-023-00680-1 – ^ Vissza
2. Bordas, Antoine, Pascal Le Masson, Maxime Thomas, and Benoit Weil. 2024. ‘What Is Generative in Generative Artificial Intelligence? A Design-Based Perspective’. Research in Engineering Design 35 (4): 427–43. doi:10.1007/s00163-024-00441-x – ^ Vissza
3. Feuerriegel, Stefan, Jochen Hartmann, Christian Janiesch, and Patrick Zschech. 2024. ‘Generative AI’. Business & Information Systems Engineering 66 (1): 111–26. doi:10.1007/s12599-023-00834-7 – ^ Vissza
4. Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. ‘Generative Adversarial Networks’. arXiv. doi:10.48550/ARXIV.1406.2661 – ^ Vissza
5. Hariri, Walid. 2023. ‘Unlocking the Potential of ChatGPT: A Comprehensive Exploration of Its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing’. arXiv. doi:10.48550/ARXIV.2304.02017 – ^ Vissza
6. Tan, Yue Hern, Hui Na Chua, Yeh-Ching Low, and Muhammed Basheer Jasser. 2024. ‘Current Landscape of Generative AI: Models, Applications, Regulations and Challenges’. In 2024 IEEE 14th International Conference on Control System, Computing and Engineering (ICCSCE), 168–73. Penang, Malaysia: IEEE. doi:10.1109/ICCSCE61582.2024.10696569 – ^ Vissza
7. Turchi, Tommaso, Silvio Carta, Luciano Ambrosini, and Alessio Malizia. 2023. ‘Human-AI Co-Creation: Evaluating the Impact of Large-Scale Text-to-Image Generative Models on the Creative Process’. In End-User Development, edited by Lucio Davide Spano, Albrecht Schmidt, Carmen Santoro, and Simone Stumpf, 13917:35–51. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland. doi:10.1007/978-3-031-34433-6_3 – ^ Vissza
8. Van Der Zant, Tijn, Matthijs Kouw, and Lambert Schomaker. 2013. ‘Generative Artificial Intelligence’. In Philosophy and Theory of Artificial Intelligence, edited by Vincent C. Müller, 5:107–20. Studies in Applied Philosophy, Epistemology and Rational Ethics. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-31674-6_8 – ^ Vissza
9. Zhang, Han, Jing Yu Koh, Jason Baldridge, Honglak Lee, and Yinfei Yang. 2021. ‘Cross-Modal Contrastive Learning for Text-to-Image Generation’. arXiv. doi:10.48550/ARXIV.2101.04702 – ^ Vissza