Researchers at the University of Szeged developed the first manually validated and reproducible Python error collection

Researchers at the University of Szeged developed the first manually validated and reproducible Python error collection
Sources: Unsplash - Behnam Norouzi

Python is one of today's most popular programming languages, yet few bug repositories contain actual, reproducible bugs. PyBugHive fills this gap as the first manually validated database containing reproducible Python bugs.

The first version of the database contains 149 bugs from 11 open-source projects, providing researchers with the opportunity for precise analysis. The database was developed by researchers at the University of Szeged, with support from the European Union RRF-2.3.1-21-2022-00004 project and the National Research, Development and Innovation Fund. They presented their results in the paper titled "PyBugHive: A Comprehensive Database of Manually Validated, Reproducible Python Bugs" (Gábor et al., 2024).

The PyBugHive was compiled through a rigorous selection process. The bugs were manually verified, documented with real developer fixes, and reproduced by testing with environmental settings. The database includes summaries of bug tickets, related patches, and test cases that demonstrate the bug's presence. PyBugHive also provides extensibility, allowing for the insertion of additional bugs. The accuracy of the data is validated by the fact that neither Pylint nor the Bandit static analysis tool could automatically recognize any of the bugs, confirming that the bug repository contains truly non-trivial cases.

Sources:

PyBugHive: A Comprehensive Database of Manually Validated, Reproducible Python Bugs
Python is currently the number one language in the TIOBE index and has been the second most popular language on GitHub for years. But so far, there are only a few bug databases that contain bugs for Python projects and even fewer in which bugs can be reproduced. In this paper, we present a manually curated database of reproducible Python bugs called PyBugHive. The initial version of PyBugHive is a benchmark of 149 real, manually validated bugs from 11 Python projects. Each entry in our database contains the summary of the bug report, the corresponding patch, and the test cases that expose the given bug. PyBugHive features a rich command line interface for accessing both the buggy and fixed versions of the programs and provides the abstraction for executing the corresponding test cases. The interface facilitates highly reproducible empirical research and tool comparisons in fields such as testing, automated program repair, or bug prediction. The usage of our database is demonstrated through a use case involving a large language model, GPT-3.5. First, we evaluated the bug detection capabilities of the model with the help of the bug repository. Using multiple prompts, we found out that GPT-3.5 was able to detect 67 out of 149 bugs (45%). Furthermore, we leveraged the constructed bug dataset in assessing the automatic program repair capabilities of GPT-3.5 by comparing the generated fixes with the real patches contained in the dataset. However, its performance was far worse in this task compared to bug detection, as it was able to fix only one of the detected issues.