Approximate String Matching with q-grams and Maximal Matches
Abstrak
We study approximate string matching in connection with two string distance functions that are computable in linear time. The first function is based on the so-called $q$-grams. An algorithm is given for the associated string matching problem that finds the locally best approximate occurences of pattern $P$, $|P|=m$, in text $T$, $|T|=n$, in time $O(n\log (m-q))$. The occurences with distance $\leq k$ can be found in time $O(n\log k)$. The other distance function is based on finding maximal common substrings and allows a form of approximate string matching in time $O(n)$. Both distances give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edit distance based string matching.
Topik & Kata Kunci
Penulis (1)
E. Ukkonen
Akses Cepat
- Tahun Terbit
- 1992
- Bahasa
- en
- Total Sitasi
- 701×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1016/0304-3975(92)90143-4
- Akses
- Open Access ✓