Bias in artificial intelligence systems has become a critical concern as these technologies increasingly influence decision-making across domains such as healthcare, criminal justice, and employment. Bias manifests as systematic errors that lead to unfair or discriminatory outcomes, often disproportionately affecting marginalised groups. Understanding the origins of bias is essential for developing effective strategies to identify and mitigate it. Dataset imbalance occurs when the data used to train AI models does not adequately represent the diversity of the target population or phenomenon. This imbalance can lead to models that perform poorly for underrepresented groups or reinforce existing inequalities. For instance, facial recognition systems trained on datasets with a predominance of light-skinned individuals have historically shown lower accuracy for darker-skinned faces (Buolamwini and Gebru 2018). Such disparities arise because machine learning algorithms optimise for the majority class, often neglecting minority groups.
The roots of dataset imbalance lie in both practical and systemic factors. Collecting diverse data can be challenging due to logistical constraints, such as limited access to certain populations or regions. However, systemic issues, such as the historical exclusion of marginalised groups from data collection, exacerbate the problem (West et al. 2019). For example, medical datasets often underrepresent women and ethnic minorities, leading to biased diagnostic models that fail to account for their unique health profiles (Obermeyer et al. 2019). Mitigating dataset imbalance requires proactive strategies, such as oversampling underrepresented groups, generating synthetic data, or using transfer learning to adapt models to diverse contexts (Barocas et al. 2023). Validation techniques, such as stratified sampling and fairness-aware metrics like demographic parity, can help assess whether models perform equitably across groups. These approaches, however, must be complemented by efforts to address the structural factors that perpetuate imbalanced data collection.
Annotation bias arises during the process of labelling data, where human annotators introduce subjective or erroneous judgements that skew the training data. Since many AI systems rely on supervised learning, the quality and impartiality of annotations directly influence model performance. Annotation bias can stem from annotators’ cultural, social, or personal biases, as well as from ambiguous labelling guidelines or inadequate training (Geiger et al. 2020). A notable example is in natural language processing (NLP), where sentiment analysis models trained on biased annotations may misinterpret expressions from certain cultural or linguistic groups. For instance, annotations that label African American Vernacular English as “negative” or “informal” can lead to models that unfairly penalise these dialects (Sap et al. 2019). Similarly, in image recognition, annotators may inadvertently prioritise certain visual features, such as gender or race, over others, embedding stereotypes into the model (Crawford and Paglen 2021). Managing annotation bias requires robust annotation protocols, including clear guidelines, diverse annotator pools, and inter-annotator agreement checks. Techniques such as active learning, where models iteratively query annotators to refine ambiguous labels, can also reduce bias (Settles 2011). Validation methods, such as auditing annotations for consistency and fairness, are critical to ensuring that labelled data accurately reflects the intended task. Engaging communities affected by the AI system in the annotation process can further enhance fairness and accountability (West et al. 2019).
Pre-existing modelling choices refer to the assumptions, algorithms, and architectures selected during the design and training of AI systems. These choices, often made before data is even processed, can introduce bias by embedding developers’ implicit assumptions or prioritising certain outcomes over others. For example, the choice of a loss function in a classification model may prioritise overall accuracy at the expense of fairness for minority groups (Hardt et al. 2016). One common source of bias in modelling choices is the reliance on proxy variables that inadvertently capture protected attributes, such as race or gender. For instance, architectural decisions, such as the use of deep neural networks with high complexity, can amplify biases in imbalanced datasets by overfitting to majority patterns (Barocas et al. 2023). Addressing bias from modelling choices requires careful consideration of algorithmic design and evaluation. Fairness-aware algorithms, such as those that enforce equal opportunity or disparate impact constraints, can mitigate biased outcomes (Hardt et al. 2016). Additionally, model interpretability techniques, such as feature importance analysis, can help identify and address problematic assumptions. Validation strategies, including cross-validation and sensitivity analysis, are essential for assessing how modelling choices affect performance across diverse groups.
Identifying bias in AI systems is a multifaceted challenge that requires both technical and ethical considerations. Validation plays a central role in this process, enabling developers to assess whether models produce equitable outcomes. Common validation techniques include fairness metrics, such as equalised odds and calibration, which measure disparities in model performance across groups (Barocas et al. 2023). Adversarial testing, where models are evaluated on intentionally perturbed or diverse inputs, can also uncover hidden biases (Geiger et al. 2020). Managing bias, however, extends beyond technical fixes. It demands a commitment to ethical AI development, including transparency, stakeholder engagement, and continuous monitoring. Post-deployment audits, where models are evaluated in real-world settings, can detect biases that emerge over time (Obermeyer et al. 2019). Moreover, interdisciplinary collaboration between data scientists, ethicists, and affected communities is essential for designing systems that align with societal values (Crawford and Paglen 2021).
References:
1. Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. MIT Press. https://fairmlbook.org/ ^ Back
2. Buolamwini, Joy and Timnit Gebru. 2018. ‘Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.’ In Conference on Fairness, Accountability and Transparency, 77–91. Proceedings of Machine Learning Research ^ Back
3. Crawford, Kate, and Trevor Paglen. 2021. ‘Excavating AI: The Politics of Images in Machine Learning Training Sets’. AI & Society 36 (4): 1105–1116. https://doi.org/10.1007/s00146-020-00970-8 ^ Back
4. Geiger, R. Stuart, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. ‘Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?’. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 325–336. Association for Computing Machinery. https://doi.org/10.1145/3351095.3375624 ^ Back
5. Hardt, Moritz, Eric Price, and Nati Srebro. 2016. ‘Equality of Opportunity in Supervised Learning’. Advances in Neural Information Processing Systems 29. https://proceedings.neurips.cc/.../9d2682367c3935defcb1f9e247a97c0d ^ Back
6. Obermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. ‘Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.’ Science 366(6464): 447–453. https://doi.org/10.1126/science.aax2342 ^ Back
7. Sap, Maarten, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. ‘The Risk of Racial Bias in Hate Speech Detection’. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1668–1678. Association for Computational Linguistics. https://aclanthology.org/P19-1163/ ^ Back
8. Settles, Burr. 2011. ‘Closing the Loop: Fast, Interactive Semi-Supervised Annotation with Queries on Features and Instances’. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1467–1478. Association for Computational Linguistics. https://aclanthology.org/D11-1134/ ^ Back
9. West, Sarah Myers, Meredith Whittaker, and Kate Crawford. 2019. ‘Discriminating Systems’. AI Now 2019 Report, 1–33. https://ainowinstitute.org/discriminatingsystems.html ^ Back