Efficient repeat finding in sets of strings via suffix arrays

We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O...

Descripción completa

Detalles Bibliográficos
Autores principales: Barenbaum, P., Becher, V., Deymonnaz, A., Halsband, M., Heiber, P.A.
Formato: JOUR
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum
Aporte de:
id todo:paper_14627264_v15_n2_p59_Barenbaum
record_format dspace
spelling todo:paper_14627264_v15_n2_p59_Barenbaum2023-10-03T16:16:38Z Efficient repeat finding in sets of strings via suffix arrays Barenbaum, P. Becher, V. Deymonnaz, A. Halsband, M. Heiber, P.A. Longest maximal substring Repeats Stringology Suffix array Expensive parts Input string Maximal repeats Maximal substring Repeats Stringology Sub-strings Suffix arrays Algorithms Indexing (of information) We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n log m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. © 2013 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France. Fil:Barenbaum, P. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Fil:Becher, V. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Fil:Heiber, P.A. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Longest maximal substring
Repeats
Stringology
Suffix array
Expensive parts
Input string
Maximal repeats
Maximal substring
Repeats
Stringology
Sub-strings
Suffix arrays
Algorithms
Indexing (of information)
spellingShingle Longest maximal substring
Repeats
Stringology
Suffix array
Expensive parts
Input string
Maximal repeats
Maximal substring
Repeats
Stringology
Sub-strings
Suffix arrays
Algorithms
Indexing (of information)
Barenbaum, P.
Becher, V.
Deymonnaz, A.
Halsband, M.
Heiber, P.A.
Efficient repeat finding in sets of strings via suffix arrays
topic_facet Longest maximal substring
Repeats
Stringology
Suffix array
Expensive parts
Input string
Maximal repeats
Maximal substring
Repeats
Stringology
Sub-strings
Suffix arrays
Algorithms
Indexing (of information)
description We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n log m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. © 2013 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France.
format JOUR
author Barenbaum, P.
Becher, V.
Deymonnaz, A.
Halsband, M.
Heiber, P.A.
author_facet Barenbaum, P.
Becher, V.
Deymonnaz, A.
Halsband, M.
Heiber, P.A.
author_sort Barenbaum, P.
title Efficient repeat finding in sets of strings via suffix arrays
title_short Efficient repeat finding in sets of strings via suffix arrays
title_full Efficient repeat finding in sets of strings via suffix arrays
title_fullStr Efficient repeat finding in sets of strings via suffix arrays
title_full_unstemmed Efficient repeat finding in sets of strings via suffix arrays
title_sort efficient repeat finding in sets of strings via suffix arrays
url http://hdl.handle.net/20.500.12110/paper_14627264_v15_n2_p59_Barenbaum
work_keys_str_mv AT barenbaump efficientrepeatfindinginsetsofstringsviasuffixarrays
AT becherv efficientrepeatfindinginsetsofstringsviasuffixarrays
AT deymonnaza efficientrepeatfindinginsetsofstringsviasuffixarrays
AT halsbandm efficientrepeatfindinginsetsofstringsviasuffixarrays
AT heiberpa efficientrepeatfindinginsetsofstringsviasuffixarrays
_version_ 1807324311775608832