1

I have a bunch of strings such as

1245046126856123
5293812332348977
1552724141123171
7992612370048696
6912394320472896

I give the program a pattern which in this case is '123' and the expected output is

1245046126856[123]
52938[123]32348977
1552724141[123]171
79926[123]70048696
69[123]94320472896

To do this I record the indices where the pattern occurs in an array and then I make an empty array of chars and put '[' and ']' according to the indices. So far it works fine but when I have a string such as

12312312312312312123

I should get

[123][123][123][123][123]12[123]

However I cannot find a way to record the indices according to such a case. I use rabin-karp algorithm for pattern matching and the section where I calculate the indeces as to where to put the brackets is as follows

if(j == M){
      index[k] = i; //beginning index
      index[k+1] = i+M+1; //finishing index
      if((k!=0)&&(index[k-1] == index[k])){
           index[k]++;
           index[k+1]++;
      }

      if((k!=0)&&(index[k] < index[k-1])){
           index[k] = index[k-2]+M+1;
           index[k+1] = i-index[k-1]+M+1;
      }
      k += 2;
}

i is the index where the pattern starts to occur, j is the index where the algorithm terminates the pattern check (last character of the given pattern), k is the index of the index array. M is the length of the pattern

This results in a string (where only the brackets are placed) like this

[   ][   ][   ][   ][   ][   ]

but as you can see, there should be two empty spaces between the last two sets of brackets. How can I adjust way I calculate the indexes so that I can properly place the brackets?

STT
  • 59
  • 6
  • 1
    You could use a State Machine (with three states) The {begin,end} pointers could also serve as a surrogate for the state. (or just: the matched length) – wildplasser Jan 20 '22 at 12:21
  • I doubt we are allowed to do that as this is a school project and we haven't learned about state machines. – STT Jan 20 '22 at 12:22
  • 2
    @STT: All restrictions should be stated in the question itself. If you don't specify any restrictions in the question, then you will likely get answers that violate your restrictions. – Andreas Wenzel Jan 20 '22 at 12:25
  • 2
    Please provide a more complete code. See [MCVE](https://stackoverflow.com/help/mcve) for details. – Gerhardh Jan 20 '22 at 12:26
  • 1
    A state machine can here be represented by a variable that counts the number of characters just read that correspond to the searched string. 0, 1, 2, 3 -> go! You are not allowed to use a state machine, but a counter? – Damien Jan 20 '22 at 12:30
  • Please also provide values for `i`, `j`, `k` when you execute that snippet. `M` should be 3 I assume. – Gerhardh Jan 20 '22 at 12:34
  • yeah, state machine is just the fancy and more academic term for it – user1984 Jan 20 '22 at 12:35
  • what is the expected output if the pattern is `"121"` and the string is `"12121"`? is it `"[12[1]21]"` or `"[121]12"` – Daniel Jan 20 '22 at 12:43
  • @Andreas Wenzel it isn't explicitly restricted, it's simply something we haven't learned yet and I don't know how to implement it. – STT Jan 20 '22 at 12:44
  • @Daniel that's a good question, to be honest, the assignment paper doesn't give too much explanation about the outputs, it just gives one example. But I believe they would expect us to do it like "[121]12" since they ask us to highlight it with brackets. – STT Jan 20 '22 at 12:48
  • @STT if that's the case I got a solution for you to find the indexes, I'll post it and hope it helps. – Daniel Jan 20 '22 at 13:00
  • @Daniel sure thing, that'd be awesome! – STT Jan 20 '22 at 13:45
  • @STT my bad I thought it was a python question.. I can still post the pythonic solution if you'd like to try and implement it in C. it looks like pseudo code so you can understand it easily and I'm pretty sure it won't take too much effort to convert it to C code. – Daniel Jan 20 '22 at 13:57
  • @Daniel sure thing that works for me, too! – STT Jan 20 '22 at 14:09
  • Still trying to minimise the amount of state needed! (there seems to be a logical XOR issue here, and I like it!) – wildplasser Jan 21 '22 at 00:16
  • BTW: I now have three versions, all seeming to work. The OP has none. – wildplasser Jan 21 '22 at 00:37
  • @wildplasser I did find a working solution. Should I edit my post to add the working one? It only does not work for Daniel's case, which is I think something out of scope for the test cases of the assignment. – STT Jan 21 '22 at 05:20

1 Answers1

-1

EDIT

thought it was a python question at first so this is a pythonic answer, but it may help as a pseudo code.

this piece of code should help you find all the indexes in the string that holds the pattern.

    string = "12312312312312123"
    ptrn = "123"
    i = 0
    indexes = [] //create a dynamic array (it may also be constant size string length/pattern length or just the string length)
    while True:
        i = string.find(ptrn, i) //get the next index of the pattern in a substring that starts from the last index of last suffix of the pattern.
        if i == -1: //if such index inside the original string (the pattern exists).
            break
        indexes.append(i) //add the found index of the pattern occurrence into the array.
        i += len(ptrn) //get to the next position where the pattern may appear not inside another pattern.

    print(indexes)

if you would like to have it on every pattern match even if it's inside another match, you can remove the i+=len(ptrn) and replace the while statement with for i in range(0,len(string)): // run for every index of the string - for(int i=0; i<strlen(string); i++)

Daniel
  • 1,895
  • 9
  • 20