Exact pattern matching. Current achievements and research
Keywords:pattern matching, exact pattern matching, string algorithms, text processing
The problem of exact pattern matching is an essential programming problem. Different algorithms that solve this problem are core elements of search engines, version control systems, text editors, DNA analyzers, and many others. For simplification reasons articles usually denote pattern as P or p and pattern length as M or m. Similarly, the text is usually denoted as T or t and its length - N or n. Alphabet is denoted Σ and its length - |Σ|. Based on these notations the problem of pattern matching can be written as follows:
Find all positions/ amount of i, such that P[0...m] = T[i...i + m], or: Find all positions i in text for which substring starting at position i of the text of length m is equal to the pattern.
The main parameters of this problem are pattern length and alphabet size. The length of the text usually doesn’t matter because, for any long enough text of a specific structure, the run time of the algorithm per character will be close to constant. Besides that, the specifics of the input data and text may also impact the performances of the algorithms. All of that makes the problem both very nuanced and interesting to investigate. This problem features a lot of different existing solutions developed over the course of the last 5 decades. The main part of the work provides short descriptions and analyses of a set of algorithms that are still relevant in the field. Besides that, some remarks are made on the topic of their theoretical regions of efficiency and how they depend on the specifics of the input. The results of the practical experimentation on the variety of randomly generated test data are provided. The conclusion provides some analysis of the received results and algorithms’ class efficiency based on the input as well as a visual representation of the received results in a form of a table representing the most efficient algorithm for each pair of pattern length and alphabet size.
Pages of the article in the issue: 79 - 88
Language of the article: Ukrainian
ZAVADSKYI, I.O. A family of exact pattern matching algorithms with multiple adjacent search windows. Proceedings of the Prague Stringology Conference, p.152–166. J. Holub and J. Zdarek, Eds. Czech Technical University in Prague, Czech Republic (2017).
ZAVADSKYI, I.O Fast exact pattern matching by the means of a character bit representation SN Computer Science. 3. Springer (2022).
DURIAN, B., PELTOLA, H., SALMELA, L., TARHIO, J. Bit-parallel search algorithms for long patterns. 9th International Symposium on Experimental Algorithms. Ischia Island, Naples, Italy., pp. 129–140. Springer (2010).
KULEKCI, M. Filter based fast matching of long patterns by using simd instructions. Proceedings of the Prague Stringology Conference, pp. 118–128. J. Holub and J. Zdarek, Eds. Czech Technical University in Prague, Czech Republic (2009).
LECROQ, T. Fast exact string matching algorithms. Information Processing Letters 102(6), 229–235 (2007).
PELTOLA, H., TARHIO, J. String matching with lookahead. Discrete Applied Mathematics 163, 352–360 (2014).
FARO, S., LECROQ, T. Efficient variants of the backward-oracle-matching algorithm. Proceedings of the Prague Stringology Conference, pp. 146–160. J. Holub and J. Zdarek, Eds. Czech Technical University in Prague, Czech Republic (2008). DOI 10.1142/S0129054109006991
PELTOLA, H., TARHIO, J. Alternative algorithms for bit-parallel string matching. pp. 80–94 (2003)
HUDAIB, A., AL-KHALID, R., SULEIMAN, D., ITRIQ, M.A.A., AL-ANANI, A. A fast pattern matching algorithm with two sliding windows (tsw). Journal of Computer Science 4(5), 393–401. DOI 10.3844/jcssp.2008.393.401
FARO, S., LECROQ, T. Efficient pattern matching on binary strings. 35th International Conference on Current Trends in Theory and Practice of Computer Science, p. Poster (2009).
ZAVADSKYI, I.O Fast exact pattern matching in a bitstream and 256-ary strings.
How to Cite
Copyright (c) 2023 Anton Zuiev
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).