Efficient repeat finding in sets of strings via suffix arrays
We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O...
Autores principales: | , , |
---|---|
Publicado: |
2013
|
Materias: | |
Acceso en línea: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_14627264_v15_n2_p59_Barenbaum http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum |
Aporte de: |
id |
paper:paper_14627264_v15_n2_p59_Barenbaum |
---|---|
record_format |
dspace |
spelling |
paper:paper_14627264_v15_n2_p59_Barenbaum2023-06-08T16:16:17Z Efficient repeat finding in sets of strings via suffix arrays Barenbaum, Pablo Becher, Verónica Andrea Heiber, Pablo Ariel Longest maximal substring Repeats Stringology Suffix array Expensive parts Input string Maximal repeats Maximal substring Repeats Stringology Sub-strings Suffix arrays Algorithms Indexing (of information) We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n log m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. © 2013 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France. Fil:Barenbaum, P. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Fil:Becher, V. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Fil:Heiber, P.A. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. 2013 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_14627264_v15_n2_p59_Barenbaum http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Longest maximal substring Repeats Stringology Suffix array Expensive parts Input string Maximal repeats Maximal substring Repeats Stringology Sub-strings Suffix arrays Algorithms Indexing (of information) |
spellingShingle |
Longest maximal substring Repeats Stringology Suffix array Expensive parts Input string Maximal repeats Maximal substring Repeats Stringology Sub-strings Suffix arrays Algorithms Indexing (of information) Barenbaum, Pablo Becher, Verónica Andrea Heiber, Pablo Ariel Efficient repeat finding in sets of strings via suffix arrays |
topic_facet |
Longest maximal substring Repeats Stringology Suffix array Expensive parts Input string Maximal repeats Maximal substring Repeats Stringology Sub-strings Suffix arrays Algorithms Indexing (of information) |
description |
We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n log m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. © 2013 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France. |
author |
Barenbaum, Pablo Becher, Verónica Andrea Heiber, Pablo Ariel |
author_facet |
Barenbaum, Pablo Becher, Verónica Andrea Heiber, Pablo Ariel |
author_sort |
Barenbaum, Pablo |
title |
Efficient repeat finding in sets of strings via suffix arrays |
title_short |
Efficient repeat finding in sets of strings via suffix arrays |
title_full |
Efficient repeat finding in sets of strings via suffix arrays |
title_fullStr |
Efficient repeat finding in sets of strings via suffix arrays |
title_full_unstemmed |
Efficient repeat finding in sets of strings via suffix arrays |
title_sort |
efficient repeat finding in sets of strings via suffix arrays |
publishDate |
2013 |
url |
https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_14627264_v15_n2_p59_Barenbaum http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum |
work_keys_str_mv |
AT barenbaumpablo efficientrepeatfindinginsetsofstringsviasuffixarrays AT becherveronicaandrea efficientrepeatfindinginsetsofstringsviasuffixarrays AT heiberpabloariel efficientrepeatfindinginsetsofstringsviasuffixarrays |
_version_ |
1768545895213694976 |