I am looking to construct an algorithm for discovering repeating patterns in raw data (non-ASCII).
The shortest and largest pattern sizes to be configurable. The size of the data to search over would be in the tens of thousands of bytes.
For example, given the following data:
AB CD 01 AB CD 02 EF 03 02 EF 04 02 EF
Would output the number of times the repeating patterns would be encountered. In this case:
ABCD x2
02EF x3
I have looked at several algorithms such as suffix trees, but generally seem to be string-based.
This will be written in Python, but I'm more interested in the concepts involved rather than an actual implementation.
Many thanks for your help.