Efficient repeat finding in sets of strings via suffix arrays

We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O...

Descripción completa

Detalles Bibliográficos
Autores principales: Barenbaum, Pablo, Becher, Verónica Andrea, Heiber, Pablo Ariel
Publicado: 2013
Materias:
Acceso en línea:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_14627264_v15_n2_p59_Barenbaum
http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum
Aporte de:
id paper:paper_14627264_v15_n2_p59_Barenbaum
record_format dspace
spelling paper:paper_14627264_v15_n2_p59_Barenbaum2023-06-08T16:16:17Z Efficient repeat finding in sets of strings via suffix arrays Barenbaum, Pablo Becher, Verónica Andrea Heiber, Pablo Ariel Longest maximal substring Repeats Stringology Suffix array Expensive parts Input string Maximal repeats Maximal substring Repeats Stringology Sub-strings Suffix arrays Algorithms Indexing (of information) We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n log m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. © 2013 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France. Fil:Barenbaum, P. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Fil:Becher, V. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Fil:Heiber, P.A. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. 2013 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_14627264_v15_n2_p59_Barenbaum http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Longest maximal substring
Repeats
Stringology
Suffix array
Expensive parts
Input string
Maximal repeats
Maximal substring
Repeats
Stringology
Sub-strings
Suffix arrays
Algorithms
Indexing (of information)
spellingShingle Longest maximal substring
Repeats
Stringology
Suffix array
Expensive parts
Input string
Maximal repeats
Maximal substring
Repeats
Stringology
Sub-strings
Suffix arrays
Algorithms
Indexing (of information)
Barenbaum, Pablo
Becher, Verónica Andrea
Heiber, Pablo Ariel
Efficient repeat finding in sets of strings via suffix arrays
topic_facet Longest maximal substring
Repeats
Stringology
Suffix array
Expensive parts
Input string
Maximal repeats
Maximal substring
Repeats
Stringology
Sub-strings
Suffix arrays
Algorithms
Indexing (of information)
description We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n log m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. © 2013 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France.
author Barenbaum, Pablo
Becher, Verónica Andrea
Heiber, Pablo Ariel
author_facet Barenbaum, Pablo
Becher, Verónica Andrea
Heiber, Pablo Ariel
author_sort Barenbaum, Pablo
title Efficient repeat finding in sets of strings via suffix arrays
title_short Efficient repeat finding in sets of strings via suffix arrays
title_full Efficient repeat finding in sets of strings via suffix arrays
title_fullStr Efficient repeat finding in sets of strings via suffix arrays
title_full_unstemmed Efficient repeat finding in sets of strings via suffix arrays
title_sort efficient repeat finding in sets of strings via suffix arrays
publishDate 2013
url https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_14627264_v15_n2_p59_Barenbaum
http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum
work_keys_str_mv AT barenbaumpablo efficientrepeatfindinginsetsofstringsviasuffixarrays
AT becherveronicaandrea efficientrepeatfindinginsetsofstringsviasuffixarrays
AT heiberpabloariel efficientrepeatfindinginsetsofstringsviasuffixarrays
_version_ 1768545895213694976