Information extraction with active learning : a case study in legal text

Active learning has been successfully applied to a number of NLP tasks. In this paper, we present a study on Information Extraction for natural language licenses that need to be translated to RDF. The final purpose of our work is to automatically extract from a natural language document specifying a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cardellino, Cristian Adrián, Villata, Serena, Alonso i Alemany, Laura, Cabrio, Elena
Formato: article
Lenguaje:Inglés
Publicado: 2022
Materias:
Acceso en línea:http://hdl.handle.net/11086/27448
Aporte de:
id I10-R141-11086-27448
record_format dspace
institution Universidad Nacional de Córdoba
institution_str I-10
repository_str R-141
collection Repositorio Digital Universitario (UNC)
language Inglés
topic Active learning
Natural language processing
Ontology-based information extraction
spellingShingle Active learning
Natural language processing
Ontology-based information extraction
Cardellino, Cristian Adrián
Villata, Serena
Alonso i Alemany, Laura
Cabrio, Elena
Information extraction with active learning : a case study in legal text
topic_facet Active learning
Natural language processing
Ontology-based information extraction
description Active learning has been successfully applied to a number of NLP tasks. In this paper, we present a study on Information Extraction for natural language licenses that need to be translated to RDF. The final purpose of our work is to automatically extract from a natural language document specifying a certain license a machine-readable description of the terms of use and reuse identified in such license. This task presents some peculiarities that make it specially interesting to study: highly repetitive text, few annotated or unannotated examples available, and very fine precision needed.In this paper we compare different active learning settings for this particular application. We show that the most straightforward approach to instance selection, uncertainty sampling, does not provide a good performance in this setting, performing even worse than passive learning. Density-based methods are the usual alternative to uncertainty sampling, in contexts with very few labelled instances. We show that we can obtain a similar effect to that of density-based methods using uncertainty sampling, by just reversing the ranking criterion, and choosing the most certain instead of the most uncertain instances.
format article
author Cardellino, Cristian Adrián
Villata, Serena
Alonso i Alemany, Laura
Cabrio, Elena
author_facet Cardellino, Cristian Adrián
Villata, Serena
Alonso i Alemany, Laura
Cabrio, Elena
author_sort Cardellino, Cristian Adrián
title Information extraction with active learning : a case study in legal text
title_short Information extraction with active learning : a case study in legal text
title_full Information extraction with active learning : a case study in legal text
title_fullStr Information extraction with active learning : a case study in legal text
title_full_unstemmed Information extraction with active learning : a case study in legal text
title_sort information extraction with active learning : a case study in legal text
publishDate 2022
url http://hdl.handle.net/11086/27448
work_keys_str_mv AT cardellinocristianadrian informationextractionwithactivelearningacasestudyinlegaltext
AT villataserena informationextractionwithactivelearningacasestudyinlegaltext
AT alonsoialemanylaura informationextractionwithactivelearningacasestudyinlegaltext
AT cabrioelena informationextractionwithactivelearningacasestudyinlegaltext
bdutipo_str Repositorios
_version_ 1764820391734280194