0

I have a complex problem at hand i.e. I have a huge(more than 200000 characters) :-

'1213 1242 1213 49 1213 12134 4561213 154816 4631 154816'

Output to be something like :-

1. No. of distinct recurrent patterns
2. Each's pattern's repitition count #=> ([12], 6), ([121], 6), ([1213], 6), ([213], 6), ((21), 6), ((13), 6), .....

There are lots of solutions on finding longest repeating string using ruby/c/c++ but very few for finding all recurring substrings.

I am looking for some conventional algorithm to perform this operation. like we have Floyd's cycle finding algo. for identifying cycles, etc. Something of that sort would be great to get started with.

Nishutosh Sharma
  • 1,926
  • 2
  • 24
  • 39

1 Answers1

1

A cycle refers to the repetition of the entire set from beginning to end, over and over. You are looking for recurring patterns within the set which is not the same thing as a cycle.

One brute-force approach to your problem would be to iterate the entire set two at a time looking for patterns of two, if you haven't seen that pattern yet store it in a map and set count to 1, otherwise increase the count. Then do the same for patterns of three and so on. This would be pretty slow with a big input so there are optimizations you would have to make.

j_buckley
  • 1,926
  • 1
  • 12
  • 13