Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles

Quality flaws prediction in Wikipedia is an ongoing research trend. In particular, in this work we tackle the problem of automatically assessing the need of including additional citations for contributing to verify the articles’ content; the so-called Refimprove quality flaw. This information qualit...

Descripción completa

Detalles Bibliográficos
Autores principales: Bazán Pereyra, Gerónimo, Cuello, Carolina, Capodici, Gianfranco, Jofré, Vanessa, Ferretti, Edgardo, Errecalde, Marcelo Luis
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2019
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/90453
Aporte de:
id I19-R120-10915-90453
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Wikipedia
Information Quality
Quality Flaws Prediction
Refimprove Flaw
spellingShingle Ciencias Informáticas
Wikipedia
Information Quality
Quality Flaws Prediction
Refimprove Flaw
Bazán Pereyra, Gerónimo
Cuello, Carolina
Capodici, Gianfranco
Jofré, Vanessa
Ferretti, Edgardo
Errecalde, Marcelo Luis
Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles
topic_facet Ciencias Informáticas
Wikipedia
Information Quality
Quality Flaws Prediction
Refimprove Flaw
description Quality flaws prediction in Wikipedia is an ongoing research trend. In particular, in this work we tackle the problem of automatically assessing the need of including additional citations for contributing to verify the articles’ content; the so-called Refimprove quality flaw. This information quality flaw, ranks among the five most frequent flaws and represents 12.4% of the flawed articles in the English Wikipedia. Underbagged decision trees, biased-SVM, and centroid-based balanced SVM –three different state-of-the-art approaches– were evaluated, with the aim of handling the existing imbalances between the number of articles’ tagged as flawed content, and the remaining untagged documents that exist in Wikipedia, which can help in the learning stage of the algorithms. Also, a uniformly sampled balanced SVM classifier was evaluated as a baseline. The results showed that under-bagged decision trees with the min rule as aggregation method, perform best achieving an F1 score of 0.96 on the test corpus from the 1st International Competition on Quality Flaw Prediction in Wikipedia; a well-known uniform evaluation corpus from this research field. Likewise, biased-SVM also achieved an F1 score that outperform previously published results.
format Objeto de conferencia
Objeto de conferencia
author Bazán Pereyra, Gerónimo
Cuello, Carolina
Capodici, Gianfranco
Jofré, Vanessa
Ferretti, Edgardo
Errecalde, Marcelo Luis
author_facet Bazán Pereyra, Gerónimo
Cuello, Carolina
Capodici, Gianfranco
Jofré, Vanessa
Ferretti, Edgardo
Errecalde, Marcelo Luis
author_sort Bazán Pereyra, Gerónimo
title Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles
title_short Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles
title_full Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles
title_fullStr Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles
title_full_unstemmed Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles
title_sort automatically assessing the need of additional citations for information quality verification in wikipedia articles
publishDate 2019
url http://sedici.unlp.edu.ar/handle/10915/90453
work_keys_str_mv AT bazanpereyrageronimo automaticallyassessingtheneedofadditionalcitationsforinformationqualityverificationinwikipediaarticles
AT cuellocarolina automaticallyassessingtheneedofadditionalcitationsforinformationqualityverificationinwikipediaarticles
AT capodicigianfranco automaticallyassessingtheneedofadditionalcitationsforinformationqualityverificationinwikipediaarticles
AT jofrevanessa automaticallyassessingtheneedofadditionalcitationsforinformationqualityverificationinwikipediaarticles
AT ferrettiedgardo automaticallyassessingtheneedofadditionalcitationsforinformationqualityverificationinwikipediaarticles
AT errecaldemarceloluis automaticallyassessingtheneedofadditionalcitationsforinformationqualityverificationinwikipediaarticles
bdutipo_str Repositorios
_version_ 1764820490048765952