0

I have a string:

str_x = "121001221122010120211122211122222222112222"

I want to find out how many times a given pattern is observed in the string, but the pattern should be seen as flexible:

The pattern I'm looking for is thus:

  • at least three 2's followed by at least two 1's followed by at least three 2's

A pattern satisfying this condition will thus for example be "22211222", but also "2222111222" and "222222221111111111222"

I want to find out how many times this "flexible pattern" is seen in str_x.

The correct answer here is 2 times.

Any ideas how to do this? Thanks a bunch.

EDIT

Given the definition I placed above, the answer of 2 times is actually incorrect, since valid patterns overlap... e.g. "222111222", "2221112222", "22211122222" etc. are all patterns satisfying the objective.

What I want is to find the number of patterns which do not overlap (that is, still 2 times)

Emjora
  • 379
  • 2
  • 11

2 Answers2

1

You have to use regex to solve your problem: https://docs.python.org/2/library/re.html

The regular expression:
regex = r"2{3,}?1{2,}?2{3,}?"
means = find at least three 2's followed by at least two 1's followed by at least three 2's

notation 2{3,} means find all at least three 2's
? means - greedy search - the search that may overlap
If you want to find patterns that do not overlap - just remove ?

import re

regex = r"2{3,}?1{2,}?2{3,}?"

test_str = "121001221122010120211122211122222222112222"

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
print ("total matches: {matches}".format(matches= matchNum))
Laser
  • 6,652
  • 8
  • 54
  • 85
1

Here's a piece of code which works:

    def count_pattern(str):
        # one_count keeps count of contiguous 1s
        # we check for the pattern at 2 just after a block of 1s
        # count keeps track of pattern counts
        count=0
        one_count=0
        for i in range(1,len(str)):
            if str[i]=='1':
                if str[i-1]=='1':
                    one_count=one_count+1
                else:
                    one_count=1
            elif (str[i]=='2')&(str[i-1]=='1')&(len(str)-i>2)&
                 (i>one_count+2)&(one_count>1)&(str[(i+1):(i+3)]=='22')&
                 (str[(i-one_count-3):(i-one_count)]=='222'):
                count=count+1
         return(count)


      print("Number of times the pattern 
       occurs=",count_pattern('121001221122010120211122211122222222112222'))
StatsML
  • 111
  • 3