Empirical Software Engineering

  1573-7616

  1382-3256

 

Cơ quản chủ quản:  SPRINGER , Springer Netherlands

Lĩnh vực:
Software

Phân tích ảnh hưởng

Thông tin về tạp chí

 

Các bài báo tiêu biểu

Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension
Tập 28 - Trang 1-42 - 2023
José Aldo Silva da Costa, Rohit Gheyi, Fernando Castor, Pablo Roberto Fernandes de Oliveira, Márcio Ribeiro, Baldoino Fonseca
Code comprehension is crucial for software maintenance and evolution, but it can be hindered by tiny code snippets that can confuse the developers, called atoms of confusion. Previous studies investigated how atoms impact code comprehension through the perspectives of time, accuracy, and opinions of developers. However, we need more studies evaluating other perspectives and the combination of these perspectives on a common ground through experiments. In our study, we evaluate how the eye tracking method can be used to gain new insights when we compare programs obfuscated by the atoms with functionally equivalent clarified versions. We conduct a controlled experiment with 32 novices in Python and measure their time, number of attempts, and visual effort with eye tracking through fixation duration, fixations count, and regressions count. We also conduct interviews and investigate the subjects’ difficulties with the programs. In our results, the clarified version of the code with Operator Precedence reduced the time spent in the region that contains the atom to the extent of 38.6%, and the number of answer attempts by 28%. Most subjects found the obfuscated version more difficult to solve than the clarified one, and they reported the order of precedence to be difficult to validate. By analyzing their visual effort, in the obfuscated version, we observed an increase of 47.3% in the horizontal regressions count in the atom region, making its reading more difficult. The additional atoms evaluated revealed other interesting nuances. Based on our findings, we encourage researchers to consider eye tracking combined with other perspectives to evaluate atoms of confusion and educators to favor patterns that do not impact the understanding and visual effort of undergraduates.
Are free Android app security analysis tools effective in detecting known vulnerabilities?
Tập 25 - Trang 178-219 - 2019
Venkatesh-Prasad Ranganath, Joydeep Mitra
Increasing interest in securing the Android ecosystem has spawned numerous efforts to assist app developers in building secure apps. These efforts have resulted in tools and techniques capable of detecting vulnerabilities and malicious behaviors in apps. However, there has been no evaluation of the effectiveness of these tools and techniques in detecting known vulnerabilities. The absence of such evaluations puts app developers at a disadvantage when choosing security analysis tools to secure their apps. In this regard, we evaluated the effectiveness of vulnerability detection tools for Android apps. We reviewed 64 tools and empirically evaluated 14 vulnerability detection tools against 42 known unique vulnerabilities captured by Ghera benchmarks, which are composed of both vulnerable and secure apps. Of the 20 observations from the evaluation, the main observation is existing vulnerability detection tools for Android apps are very limited in their ability to detect known vulnerabilities — all of the evaluated tools together could only detect 30 of the 42 known unique vulnerabilities. More effort is required if security analysis tools are to help developers build secure apps. We hope the observations from this evaluation will help app developers choose appropriate security analysis tools and persuade tool developers and researchers to identify and address limitations in their tools and techniques. We also hope this evaluation will catalyze or spark a conversation in the software engineering and security communities to require a more rigorous and explicit evaluation of security analysis tools and techniques.
In This Issue
Tập 8 - Trang 223-224 - 2003
Lionel Briand, Vic Basili
Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT
- 2024
Saurabh Pujar, Yunhui Zheng, Luca Buratti, Burn Lewis, Yunchung Chen, Jim Laredo, Alessandro Morari, Edward Epstein, Tsungnan Lin, Bo Yang, Zhong Su
Static analysis tools are widely used for vulnerability detection as they can analyze programs with complex behavior and millions of lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to learn from programming language data opens new possibilities of reducing false positives when applied to static analysis. However, existing datasets to train models for vulnerability identification suffer from multiple limitations such as limited bug context, limited size, and synthetic and unrealistic source code. We propose Differential Dataset Analysis or D2A, a differential analysis based approach to label issues reported by static analysis tools. The dataset built with this approach is called the D2A dataset. The D2A dataset is built by analyzing version pairs from multiple open source projects. From each project, we select bug fixing commits and we run static analysis on the versions before and after such commits. If some issues detected in a before-commit version disappear in the corresponding after-commit version, they are very likely to be real bugs that got fixed by the commit. We use D2A to generate a large labeled dataset. We then train both classic machine learning models and deep learning models for vulnerability identification using the D2A dataset. We show that the dataset can be used to build a classifier to identify possible false alarms among the issues reported by static analysis, hence helping developers prioritize and investigate potential true positives first. To facilitate future research and contribute to the community, we make the dataset generation pipeline and the dataset publicly available. We have also created a leaderboard based on the D2A dataset, which has already attracted attention and participation from the community.
Task estimation for software company employees based on computer interaction logs
Tập 26 - Trang 1-48 - 2021
Florian Pellegrin, Zeynep Yücel, Akito Monden, Pattara Leelaprute
Digital tools and services collect a growing amount of log data. In the software development industry, such data are integral and boast valuable information on user and system behaviors with a significant potential of discovering various trends and patterns. In this study, we focus on one of those potential aspects, which is task estimation. In that regard, we perform a case study by analyzing computer recorded activities of employees from a software development company. Specifically, our purpose is to identify the task of each employee. To that end, we build a hierarchical framework with a 2-stage recognition and devise a method relying on Bayesian estimation which accounts for temporal correlation of tasks. After pre-processing, we run the proposed hierarchical scheme to initially distinguish infrequent and frequent tasks. At the second stage, infrequent tasks are discriminated between them such that the task is identified definitively. The higher performance rate of the proposed method makes it favorable against the association rule-based methods and conventional classification algorithms. Moreover, our method offers significant potential to be implemented on similar software engineering problems. Our contributions include a comprehensive evaluation of a Bayesian estimation scheme on real world data and offering reinforcements against several challenges in the data set (samples with different measurement scales, dependence characteristics, imbalance, and with insignificant pieces of information).
Correction to: On the need of preserving order of data when validating within-project defect classifiers
Tập 25 Số 6 - Trang 4831-4832 - 2020
Davide Falessi, Jacky Huang, Likhita Narayana, Jennifer Fong Thai, Burak Turhan
To fulfill the contractual requirement of the Compact agreement, the following funding note has to be added and placed in the Funding section of the original article: Open access funding provided by Università degli Studi di Roma Tor Vergata within the CRUI-CARE Agreement.
Guest editorial for special section on research in search-based software engineering
Tập 22 - Trang 849-851 - 2017
Claire Le Goues, Shin Yoo
ALFAA: Active Learning Fingerprint based Anti-Aliasing for correcting developer identity errors in version control systems
Tập 25 - Trang 1136-1167 - 2020
Sadika Amreen, Audris Mockus, Russell Zaretzki, Christopher Bogart, Yuxia Zhang
An accurate determination of developer identities is important for software engineering research and practice. Without it, even simple questions such as “how many developers does a project have?” cannot be answered. The commonly used version control data from Git is full of identity errors and the existing approaches to correct these errors are difficult to validate on large scale and cannot be easily improved. We, therefore, aim to develop a scalable, highly accurate, easy to use and easy to improve approach to correct software developer identity errors. We first amalgamate developer identities from version control systems in open source software repositories and investigate the nature and prevalence of these errors, design corrective algorithms, and estimate the impact of the errors on networks inferred from this data. We investigate these questions using a collection of over 1B Git commits with over 23M recorded author identities. By inspecting the author strings that occur most frequently, we group identity errors into categories. We then augment the author strings with three behavioral fingerprints: time-zone frequencies, the set of files modified, and a vector embedding of the commit messages. We create a manually validated set of identities for a subset of OpenStack developers using an active learning approach and use it to fit supervised learning models to predict the identities for the remaining author strings in OpenStack. We then compare these predictions with a competing commercially available effort and a leading research method. Finally, we compare network measures for file-induced author networks based on corrected and raw data. We find commits done from different environments, misspellings, organizational ids, default values, and anonymous IDs to be the major sources of errors. We also find supervised learning methods to reduce errors by several times in comparison to existing research and commercial methods and the active learning approach to be an effective way to create validated datasets. Results also indicate that correction of developer identity has a large impact on the inference of the social network. We believe that our proposed Active Learning Fingerprint Based Anti-Aliasing (ALFAA) approach will expedite research progress in the software engineering domain for applications that involve developer identities.
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models
Tập 28 - Trang 1-52 - 2023
Owain Parry, Gregory M. Kapfhammer, Michael Hilton, Phil McMinn
A flaky test is a test case whose outcome changes without modification to the code of the test case or the program under test. These tests disrupt continuous integration, cause a loss of developer productivity, and limit the efficiency of testing. Many flaky test detection techniques are rerunning-based, meaning they require repeated test case executions at a considerable time cost, or are machine learning-based, and thus they are fast but offer only an approximate solution with variable detection performance. These two extremes leave developers with a stark choice. This paper introduces CANNIER, an approach for reducing the time cost of rerunning-based detection techniques by combining them with machine learning models. The empirical evaluation involving 89,668 test cases from 30 Python projects demonstrates that CANNIER can reduce the time cost of existing rerunning-based techniques by an order of magnitude while maintaining a detection performance that is significantly better than machine learning models alone. Furthermore, the comprehensive study extends existing work on machine learning-based detection and reveals a number of additional findings, including (1) the performance of machine learning models for detecting polluter test cases; (2) using the mean values of dynamic test case features from repeated measurements can slightly improve the detection performance of machine learning models; and (3) correlations between various test case features and the probability of the test case being flaky.
Techniques for evaluating fault prediction models
Tập 13 Số 5 - Trang 561-595 - 2008
Yue Jiang, Bojan Čukić, Yan Ma