Pattern matching by the terms of cache memory limitations

Authors

  • I. O. Zavadskyi Taras Shevchenko National University of Kyiv

DOI:

https://doi.org/10.17721/1812-5409.2019/3.8

Abstract

A few known techniques of exact pattern matching, such as 2-byte read, skip loop, and sliding search windows, are improved and applied to pattern matching algorithms, performing over 256-ary alphabets. Instead of 2-byte read, we offer “1.5-byte read”, i.e. reading more than 8 but less than 16 bits of two sequential bytes of a text at each iteration of a search loop. This allows us to fit the search table into L1 cache memory, which significantly improves the algorithm performance. Also, we introduce the so-called double skip loop instead of single one, resolve problems caused by endianness of a machine, and adopt the sliding windows technique to our algorithms. The experimental results averaged over 500 runs of algorithms on 40 different computers show that our algorithms outperform all other tested methods for all tested pattern lengths.

Key words: pattern matching, Boyer-Moore-Horspool, fast search, text search, sliding windows.

Pages of the article in the issue: 56 - 59

Language of the article: Ukrainian

References

FARO S., LECROQ T. (2013) The Exact Online String Matching Problem: a Review of the Most Recent Results, ACM Computing Surveys (CSUR) Surveys Homepage archive, 45(2), Art. 13.

CANTONE D., FARO S. (2005) Fastsearch algorithms: New efficient variants of the Boyer-Moore pattern-matching algorithm. J. Automata, Languages and Combinatorics, 10(5/6), pp. 589–608.

HORSPOOL R. N. (1980) Practical fast search in strings, Softw. Pract. Exp. 10(6), pp. 501–506.

PELTOLA H., TARHIO J. (2014) String matching with lookahead, Discrete Applied Mathematics, 163, pp. 352–360.

HUME A., SUNDAY D. (1991) Fast string searching, Softw. Pract. Exp. 21(11), pp. 1221–1248.

SUNDAY D. M. (1990) A very fast substring search algorithm, Commun. ACM 33(8), pp. 132–142.

Downloads

How to Cite

Zavadskyi, I. O. (2020). Pattern matching by the terms of cache memory limitations. Bulletin of Taras Shevchenko National University of Kyiv. Physical and Mathematical Sciences, (3), 56–59. https://doi.org/10.17721/1812-5409.2019/3.8

Issue

Section

Computer Science and Informatics