Page 57 - SMILESENG
P. 57

Intl. Summer School on Search- and Machine Learning-based Software Engineering
 Traceability Links Recovery in BPMN Models through Evolutionary Learning to Rank
Rau´l Lapen˜a, Ana Marce´n, Jaime Font, and Carlos Cetina SVIT Research Group
Universidad San Jorge
Villanueva de Ga´llego, Spain
Email: [rlapena,acmarcen,jfont,ccetina]@usj.es
Abstract—Traceability Links Recovery (TLR), defined as the software engineering task that deals with the automated iden- tification of dependencies between software artifacts, is a key to success in the scene of industrial software, and has been a subject of fundamental and applied investigation for many years within the software engineering community. Most TLR techniques perform traceability based on the linguistic clues of the software artifacts under study, causing BPMN models to pose an additional challenge for TLR, since they tend to contain less textual information than other artifacts. Over the past few years, we have studied TLR between natural language requirements and Model Driven Development (MDD) models through an Evolutionary Learning to Rank approach (ELtoR), retrieving traceability links through the combination of evolutionary computation and machine learning techniques, outperforming five other TLR approaches. One of the reasons behind the improvements is that ELtoR is not as dependent on the linguistic clues of the artifacts as the other TLR approaches. Our hypothesis is that ELtoR can be used to improve the state of the art in TLR between requirements and BPMN models. Through this communication, we report new ideas on how to adapt ELtoR and the necessary encodings to work over BPMN models, plus our expectations regarding the outcomes of evaluating the approach in an industrial scenario.
I. INTRODUCTION
Traceability Links Recovery (TLR) is an important support activity for development, management, and maintenance of software, and is considered as a good practice by numerous major software standards [1]. Affordable TLR can be criti- cal to the success of a project [2], and leads to increased maintainability and reliability of software systems [3], also decreasing the expected defect rate in developed software [4]. However, establishing and maintaining traceability links has proven to be a time consuming, error prone, and person- power intensive task [1]. Therefore, automated TLR has been a subject of investigation for many years within the software engineering community. In recent years, it has been attracting more attention, becoming a subject of both fundamental and applied research [5].
Software engineers from our industrial partner, an interna- tional manufacturer in the railway domain, express system requirements in natural language, and use them to design BPMN models [6]. The BPMN models are used to describe the interactions that occur between the humans and the trains, and to design and derive other software artifacts. State-of-the-art automated TLR techniques rely greatly on the language and
the syntactical, lexical, and semantical particularities of the software artifacts under study. For instance, Latent Semantic Indexing (LSI), which is the most popular TLR technique and the one that has yielded the best TLR results so far [7], is based on exploiting term similarities among the requirements and the software artifacts. BPMN models tend to present less terms and an overall lack of textual information in comparison to other artifacts. Since TLR techniques rely on the textual components of the artifacts under study, TLR becomes an ever harder task when performing TLR directly among requirements and BPMN models.
TLR-ELtoR is a Traceability Links Recovery (TLR) ap- proach that is based on an Evolutionary Algorithm and a Learning to Rank technique (ELtoR). The results obtained through this approach indicate that TLR-ELtoR may be a better alternative than LSI or other approaches when the software artifacts are incomplete or do not have much textual content [8]. This work is our first step in adapting TLR-ELtoR for its application on BPMN models taking into consideration the particularities of this kind of model [9]. We also report our expectations regarding the outcomes of evaluating the approach in an industrial real-world scenario.
II. RELATED WORK
Most of the existing works focus on Traceability Link Recovery between requirements and source code. CER- BERUS [10] provides a hybrid technique that combines in- formation retrieval, execution tracing, and prune dependency analysis allowing the tracing of requirements to source code. Eaddy et al. [11] present a systematic methodology for identi- fying which code is related to which requirement, and a suite of metrics for quantifying the amount of crosscutting code. Some other works target the TLR tasks on models. De Lucia et al. [12] present a Traceability Link Recovery method and tool based on LSI in the context of an artifact management system, which includes models. In contrast, this work does not focus on source code or MDD models. Rather, we propose new ideas on how to adapt the TLR-ELtoR approach to work over BPMN models taking their particularities into account.
III. APPROACH
The ELtoR approach is based on an Evolutionary Algorithm that relies on genetic operations and a fitness function to
45


















































































   55   56   57   58   59