Python is one of today's most popular programming languages, yet few bug repositories contain actual, reproducible bugs. PyBugHive fills this gap as the first manually validated database containing reproducible Python bugs.
The first version of the database contains 149 bugs from 11 open-source projects, providing researchers with the opportunity for precise analysis. The database was developed by researchers at the University of Szeged, with support from the European Union RRF-2.3.1-21-2022-00004 project and the National Research, Development and Innovation Fund. They presented their results in the paper titled "PyBugHive: A Comprehensive Database of Manually Validated, Reproducible Python Bugs" (Gábor et al., 2024).
The PyBugHive was compiled through a rigorous selection process. The bugs were manually verified, documented with real developer fixes, and reproduced by testing with environmental settings. The database includes summaries of bug tickets, related patches, and test cases that demonstrate the bug's presence. PyBugHive also provides extensibility, allowing for the insertion of additional bugs. The accuracy of the data is validated by the fact that neither Pylint nor the Bandit static analysis tool could automatically recognize any of the bugs, confirming that the bug repository contains truly non-trivial cases.
Sources:
