Artificial Intelligence in Social Science Interview Analysis: Automated Topic Detection in Qualitative Data

Artificial Intelligence in Social Science Interview Analysis: Automated Topic Detection in Qualitative Data
Source: Freepik - EyeEm

The study "Identification of Social Scientifically Relevant Topics in an Interview Repository: A Natural Language Processing Experiment" presents a pioneering experiment in the automated processing of qualitative data in social sciences. The joint project of the Research Documentation Centre at the Centre for Social Sciences (TK KDK) and the Institute for Computer Science and Control (SZTAKI) was launched in 2020 to enhance the searchability of social science interview archives using artificial intelligence.

The authors (Gárdos et al., 2023) selected 39 interviews (a total of 1,183 pages) to test machine learning models, with 21 interviews ultimately included in the final training set. The chosen materials encompassed narrative interviews, in-depth interviews, semi-structured interviews, and focus group discussions. Among the five tested automated topic detection methods, the best-performing approach was the one developed by SZTAKI, which analysed keywords based on assigned subject terms and demonstrated outstanding F1 accuracy. The NN-ensemble hybrid approach was closely behind, which combined the Omikuji and TF-IDF algorithms in a 3:1 ratio.

Source: Gárdos, J., Egyed-Gergely, J., Horváth, A., Pataki, B., Vajda, R. és Micsik, A. (2023) Identification of social scientifically relevant topics in an interview repository: A natural language processing experiment, DOI: 10.1108/JD-12-2022-0269

The developed system is built upon a 220-entry social science thesaurus, incorporating translated terms from the European Language Social Science Thesaurus (ELSST) and 48 newly added concepts that reflect the Hungarian social and historical context. These include terms such as "rendszerváltás" (regime change), "államosítás" (nationalisation), and "romafóbia" (anti-Roma sentiment). This approach enhances the searchability and analytical potential of social science interview archives.

Sources:

Gárdos, J., Egyed-Gergely, J., Horváth, A., Pataki, B., Vajda, R. és Micsik, A. (2023) Identification of social scientifically relevant topics in an interview repository: A natural language processing experiment, DOI: 10.1108/JD-12-2022-0269