Deobfuscating Name Scrambling as a Natural Language Generation Task

We are interested in data-driven approaches to Natural Language Generation, but semantic representations for human text are difficult and expensive to construct. By considering a methods implementation as weak semantics for the English terms extracted from the method’s name we can collect massive da...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Duboue, Pablo Ariel
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2018
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/70714
http://47jaiio.sadio.org.ar/sites/default/files/ASAI-12.pdf
Aporte de:
id I19-R120-10915-70714
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
random forest model
bytecodes
Natural language
spellingShingle Ciencias Informáticas
random forest model
bytecodes
Natural language
Duboue, Pablo Ariel
Deobfuscating Name Scrambling as a Natural Language Generation Task
topic_facet Ciencias Informáticas
random forest model
bytecodes
Natural language
description We are interested in data-driven approaches to Natural Language Generation, but semantic representations for human text are difficult and expensive to construct. By considering a methods implementation as weak semantics for the English terms extracted from the method’s name we can collect massive datasets, akin to have words and sensor data aligned at a scale never seen before. We applied our learned model to name scrambling, a common technique used to protect intellectual property and increase the effort necessary to reverse engineer Java binary code: replacing all the method and class names by a random identifier. Using 5.6M bytecode-compiled Java methods obtained from the Debian archive, we trained a Random Forest model to predict the first term in the method name. As features, we use primarily the opcodes of the bytecodes (that is, bytecodes without any parameters). Our results indicate that we can distinguish the 15 most popular terms from the others at 78% recall, helping a programmer performing reverse engineering to reduce half of the methods in a program they should further investigate.
format Objeto de conferencia
Objeto de conferencia
author Duboue, Pablo Ariel
author_facet Duboue, Pablo Ariel
author_sort Duboue, Pablo Ariel
title Deobfuscating Name Scrambling as a Natural Language Generation Task
title_short Deobfuscating Name Scrambling as a Natural Language Generation Task
title_full Deobfuscating Name Scrambling as a Natural Language Generation Task
title_fullStr Deobfuscating Name Scrambling as a Natural Language Generation Task
title_full_unstemmed Deobfuscating Name Scrambling as a Natural Language Generation Task
title_sort deobfuscating name scrambling as a natural language generation task
publishDate 2018
url http://sedici.unlp.edu.ar/handle/10915/70714
http://47jaiio.sadio.org.ar/sites/default/files/ASAI-12.pdf
work_keys_str_mv AT dubouepabloariel deobfuscatingnamescramblingasanaturallanguagegenerationtask
bdutipo_str Repositorios
_version_ 1764820481707343875