Longest repeated substring problem

A suffix tree of the letters ATCGATCGA$

In computer science, the longest repeated substring problem is the problem of finding the longest substring of a string that occurs at least twice.

This problem can be solved in linear time and space by building a suffix tree for the string (with a special end-of-string symbol like '$' appended), and finding the deepest internal node in the tree with more than one child. Depth is measured by the number of characters traversed from the root. The string spelled by the edges from the root to such a node is a longest repeated substring. The problem of finding the longest substring with at least occurrences can be solved by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least leaf descendants. To avoid overlapping repeats, you can check that the list of suffix lengths has no consecutive elements with less than prefix-length difference.

In the figure with the string "ATCGATCGA$", the longest substring that repeats at least twice is "ATCGA".

  • Allison, L. "Suffix Trees". Retrieved 2008-10-14.
  • C implementation of Longest Repeated Substring using Suffix Tree
  • Online Demo: Longest Repeated Substring


Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.