µBert: mutation testing using pre-trained language models
Mutation testing seeds faults using a predefined set of simple syntactic transformations, aka mutation operators, that are (typically) defined based on the grammar of the targeted programming language. As a result, mutation operators often alter the program semantics in ways that often lead to unnat...
Guardado en:
| Autores principales: | , |
|---|---|
| Formato: | Objeto de conferencia Resumen |
| Lenguaje: | Inglés |
| Publicado: |
2022
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/151630 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/278/259 |
| Aporte de: |
| id |
I19-R120-10915-151630 |
|---|---|
| record_format |
dspace |
| spelling |
I19-R120-10915-1516302023-05-03T20:04:19Z http://sedici.unlp.edu.ar/handle/10915/151630 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/278/259 issn:2451-7496 µBert: mutation testing using pre-trained language models Degiovanni, Renzo Papadakis, Mike 2022-10 2022 2023-04-18T14:47:55Z en Ciencias Informáticas Mutation testing Faults Mutation testing seeds faults using a predefined set of simple syntactic transformations, aka mutation operators, that are (typically) defined based on the grammar of the targeted programming language. As a result, mutation operators often alter the program semantics in ways that often lead to unnatural code (unnatural in the sense that the mutated code is unlikely to be produced by a competent programmer). Such unnatural faults may not be convincing for developers as they might perceive them as unrealistic/uninteresting, thereby hindering the usability of the method. Additionally, the use of unnatural mutants may have actual impact on the guidance and assessment capabilities of mutation testing. This is because unnatural mutants often lead to exceptions, or segmentation faults, infinite loops and other trivial cases. To deal with this issue, we propose forming mutants that are in some sense natural; meaning that the mutated code/statement follows the implicit rules, coding conventions and generally representativeness of the code produced by competent programmers. We define/capture this naturalness of mutants using language models trained on big code that learn (quantify) the occurrence of code tokens given their surrounding code. We introduce µBert, a mutation testing tool that uses a pre-trained language model (CodeBERT) to generate mutants. This is done by masking a token from the expression given as input and using CodeBERT to predict it. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Resumen http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 64-64 |
| institution |
Universidad Nacional de La Plata |
| institution_str |
I-19 |
| repository_str |
R-120 |
| collection |
SEDICI (UNLP) |
| language |
Inglés |
| topic |
Ciencias Informáticas Mutation testing Faults |
| spellingShingle |
Ciencias Informáticas Mutation testing Faults Degiovanni, Renzo Papadakis, Mike µBert: mutation testing using pre-trained language models |
| topic_facet |
Ciencias Informáticas Mutation testing Faults |
| description |
Mutation testing seeds faults using a predefined set of simple syntactic transformations, aka mutation operators, that are (typically) defined based on the grammar of the targeted programming language. As a result, mutation operators often alter the program semantics in ways that often lead to unnatural code (unnatural in the sense that the mutated code is unlikely to be produced by a competent programmer).
Such unnatural faults may not be convincing for developers as they might perceive them as unrealistic/uninteresting, thereby hindering the usability of the method. Additionally, the use of unnatural mutants may have actual impact on the guidance and assessment capabilities of mutation testing. This is because unnatural mutants often lead to exceptions, or segmentation faults, infinite loops and other trivial cases.
To deal with this issue, we propose forming mutants that are in some sense natural; meaning that the mutated code/statement follows the implicit rules, coding conventions and generally representativeness of the code produced by competent programmers. We define/capture this naturalness of mutants using language models trained on big code that learn (quantify) the occurrence of code tokens given their surrounding code.
We introduce µBert, a mutation testing tool that uses a pre-trained language model (CodeBERT) to generate mutants. This is done by masking a token from the expression given as input and using CodeBERT to predict it. |
| format |
Objeto de conferencia Resumen |
| author |
Degiovanni, Renzo Papadakis, Mike |
| author_facet |
Degiovanni, Renzo Papadakis, Mike |
| author_sort |
Degiovanni, Renzo |
| title |
µBert: mutation testing using pre-trained language models |
| title_short |
µBert: mutation testing using pre-trained language models |
| title_full |
µBert: mutation testing using pre-trained language models |
| title_fullStr |
µBert: mutation testing using pre-trained language models |
| title_full_unstemmed |
µBert: mutation testing using pre-trained language models |
| title_sort |
µbert: mutation testing using pre-trained language models |
| publishDate |
2022 |
| url |
http://sedici.unlp.edu.ar/handle/10915/151630 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/278/259 |
| work_keys_str_mv |
AT degiovannirenzo μbertmutationtestingusingpretrainedlanguagemodels AT papadakismike μbertmutationtestingusingpretrainedlanguagemodels |
| _version_ |
1765659992858296320 |