Towards an Empirical Model to Identify When Bugs are Introduced
Finding when bugs are introduced into the source code is important, because it is the first step to understand how code becomes buggy. This understanding is an essential factor to improve other areas related to software bugs, such as bug detection, bug prevention, software quality or software maintenance. However, finding when bugs are introduced is a difficult and tedious process that requires a significant amount of time and effort, to the point that it is not even clear how to define “when” a bug is introduced. All the bugs are not caused in the same way, and they do not present the same symptoms. Thus, they cannot be treated as equal when locating the bug introduction moment. Some bugs are not directly introduced into the system, and it is essential to distinguish the fact of introducing a bug in a system and when a bug is manifested itself in the system. The first case refers to the moment when the error is introduced into the project, whereas the second case refers to the first moment when the bug manifests itself in the system due to other reasons different from the insertion of buggy code. For instance, when the source code is using, calling external APIs that changed without any previous notification, causing the manifestation of the bug in some parts of the source code. To distinguish between these moments, this dissertation proposes a model to determine how bugs appear in software products. This model has been proven useful for clearly defining the code change that introduced a bug, when it exists, and to find the reasons that lead to the appearance of bugs. The model is based on the concept of when bugs manifest themselves for the first time, and how that can be determined by running a test; it also proposes a specialized terminology which helps to analyze formally the process. The validity of the model has been explored with a careful, manual analysis of a number of bugs in two different open source systems. The analysis starts with changes that fixed a bug, from which a test to determine whether or not the bug is present is defined. The results of the analysis have demonstrated that bugs are not always introduced in the source code, and this phenomenon should be further investigated to improve other disciplines of software engineering. Furthermore, the model has also been put in the context of current literature about the introduction of bugs in source code. An interesting specific result of the model is that it provides a clear condition to determine if a given algorithm for identifying the change introducing a bug is correct or not when performing the identification. This allows (i) to compute the “real” performance of algorithms based on backtracking the modified lines that fixed a bug, and (ii) a sound evaluation of those algorithms.
Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2018. Directores de la Tesis: Jesús M. González Barahona y Gregorio Robles
- IA - Tesis Doctorales