Loading...
Thumbnail Image
Item

Labeling Issues through Semantic patterns in Open-source Agile practices

Ranković,Nevena
Rankovic,Dragica
Abstract
In this paper, our objective is to find a balanced trade-off between interpretability and accurate, reliable Issue classification-specifically Bug severity-by creating an ensemble Machine Learning (ML) and Natural Language Processing (NLP) methodological approach that enhances software maintenance in Agile development environments. Using the TAWOS dataset, we explored the capabilities of state-of-the-art models such as eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), integrated with advanced NLP techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) and Singular Value Decomposition (SVD) for feature extraction. Our results demonstrate that CatBoost achieved superior performance, with accuracies of 99. 61 % for 'high' labeled severity bugs, 97.73 % for 'Critical', and 97.65 % for 'Blocker'. SHapley Additive exPlanations (SHAP) analysis further identified key semantic descriptors, such as "crash" and "timeout," as critical predictors in these models. Moreover, this research addresses a critical gap in software engineering by improving the precision and efficiency of bug triaging processes, thereby supporting more effective resource allocation and reducing costs in Agile software projects. Finally, the comprehensive preprocessing pipeline we developed, including lemmatization, outlier removal, and non-oversampling techniques, was essential in optimizing model performance, offering a robust framework for enhancing software quality assurance.
Description
Date
2025-10-25
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
(Non)-oversampling, Agile development, Ensemble ML&NLP pipeline, Post-agnostic method, Semantic descriptors
Citation
Ranković, N & Rankovic, D 2025, 'Labeling Issues through Semantic patterns in Open-source Agile practices', Knowledge-Based Systems, vol. 328, 114197. https://doi.org/10.1016/j.knosys.2025.114197
Embedded videos