Publications about Dataset
Articles in conference or workshop proceedings
-
P3: A Dataset of Partial Program Patches.
In Proc. MSR,
2024.
ACM.
doi:10.1145/3643991.3644889
Keyword(s):
Partial Fix,
Dataset,
Mining
Funding:
DFG-IDEFIX
Publisher's Version
PDF
Supplement
Artifact(s)
Abstract
Identifying and fixing bugs in programs remains a challenge and is one of the most time-consuming tasks in software development. But even after a bug is identified, and a fix has been proposed by a developer or tool, it is not uncommon that the fix is incomplete and does not cover all possible inputs that trigger the bug. This can happen quite often and leads to re-opened issues and inefficiencies. In this paper, we introduce P3, a curated dataset composed of in- complete fixes. Each entry in the set contains a series of commits fixing the same underlying issue, where multiple of the intermediate commits are incomplete fixes. These are sourced from real-world open-source C projects. The selection process involves both auto- mated and manual stages. Initially, we employ heuristics to identify potential partial fixes from repositories, subsequently we validate them through meticulous manual inspection. This process ensures the accuracy and reliability of our curated dataset. We envision that the dataset will support researchers while investigating par- tial fixes in more detail, allowing them to develop new techniques to detect and fix them.BibTeX Entry
@inproceedings{MSR24, author = {Dirk Beyer and Lars Grunske and Matthias Kettl and Marian Lingsch-Rosenfeld and Moeketsi Raselimo}, title = {P3: A Dataset of Partial Program Patches}, booktitle = {Proc.\ MSR}, pages = {}, year = {2024}, publisher = {ACM}, doi = {10.1145/3643991.3644889}, url = {https://gitlab.com/sosy-lab/research/data/partial-fix-dataset}, pdf = {}, abstract = {Identifying and fixing bugs in programs remains a challenge and is one of the most time-consuming tasks in software development. But even after a bug is identified, and a fix has been proposed by a developer or tool, it is not uncommon that the fix is incomplete and does not cover all possible inputs that trigger the bug. This can happen quite often and leads to re-opened issues and inefficiencies. In this paper, we introduce P3, a curated dataset composed of in- complete fixes. Each entry in the set contains a series of commits fixing the same underlying issue, where multiple of the intermediate commits are incomplete fixes. These are sourced from real-world open-source C projects. The selection process involves both auto- mated and manual stages. Initially, we employ heuristics to identify potential partial fixes from repositories, subsequently we validate them through meticulous manual inspection. This process ensures the accuracy and reliability of our curated dataset. We envision that the dataset will support researchers while investigating par- tial fixes in more detail, allowing them to develop new techniques to detect and fix them.}, keyword = {Partial Fix, Dataset, Mining}, annote = {}, artifact = {10.5281/zenodo.10319627}, funding = {DFG-IDEFIX}, }
Disclaimer:
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.