0

I am trying to count how many times a pattern occurs as a subsequence of a string and also keep the indices where the match happens.

The counting is easy using recursive calls.

function count(str, pattern, strInd, patternInd) {
    if (patternInd === 0) {
      return 1;
    }

    if (strInd === 0) {
      return 0;
    }

    if (str.charAt(strInd - 1) === pattern.charAt(patternInd - 1)) {
      const count1 = count(str, pattern, strInd - 1, patternInd - 1);
      const count2 = count(str, pattern, strInd - 1, patternInd);
      return count1 + count2;
    } else {
      return count(str, pattern, strInd - 1, patternInd);
    }
  }

For keeping the indices, the logic I have is to push current index of str to a "local indices array" within a recursive call when the pattern character matches the string character, and once the pattern is finished, push the "local indices" to the "global indices" and reset the "local indices" for the next recursion path.

Resetting the local indices is where i am facing problems:

function count(str, pattern, strInd, patternInd) {
    if (patternInd === 0) {
      // when this path ends, add to list the indices found on this path
      globalIndices.push(localIndices);
      // reset the local indices
      localIndices = [];
      console.log("found");
      return 1;
    }

    if (strInd === 0) {
      return 0;
    }

    if (str.charAt(strInd - 1) === pattern.charAt(patternInd - 1)) {
      localIndices.push(strInd);
      const count1 = count(str, pattern, strInd - 1, patternInd - 1);
      const count2 = count(str, pattern, strInd - 1, patternInd);
      return count1 + count2;
    } else {
      return count(str, pattern, strInd - 1, patternInd);
    }
  }

This way it loses the previous path information after every bifurcation, because once a matched subpath is consumed, it is removed from localIndices, and localIndices starts keeping track of the matches after the bifurcation happened.

so for example, str is "abab" and pattern is "ab" then i would like to globalIndices = [[4,3], [4,1], [2,1]] but instead i would get [[4,3],[1],[2,1]]

I would like to reset "local indices" to the previous bifurcation.

Am i going in the right direction, or do these kind of problems need a different implementation altogether?

gaurav5430
  • 12,934
  • 6
  • 54
  • 111

1 Answers1

1

First, when you collect the indices, you don't need to keep a count, as the length of the final array will be the count: each array element will correspond to a match, and be a list of the relevant indices.

You can make the return value of the function the array of (partial) matches and extend each array with an extra index (for when that character is taken in the match) while backtracking:

function count(str, pattern, strInd = str.length, patternInd = pattern.length) {
    if (patternInd === 0) {
        return [[]]; // A match. Provide an array with an empty array for that match
    }

    if (strInd === 0) {
        return []; // No match. Provide empty array.
    }

    if (str.charAt(strInd - 1) === pattern.charAt(patternInd - 1)) {
        const matches1 = count(str, pattern, strInd - 1, patternInd - 1);
        const matches2 = count(str, pattern, strInd - 1, patternInd);
        // For the first case, add the current string index to the partial matches:
        return [...matches1.map(indices => [...indices, strInd-1]), ...matches2];
    } else {
        return count(str, pattern, strInd - 1, patternInd);
    }
}

console.log(count("abab", "ab")); 

Note that the indices are zero-based, so they are one less than what you mention as expected output. Also, the indices are ordered from left to right, which seems more useful.

General idea

Generally you would best avoid global variables and use as much as possible the return value of the recursive function. What you get back from it would concern only the "subtree" that the recursive call visited. In the above case, that subtree is a shorter version of both the string and pattern. What the recursive function returns should be consistent with the parameters passed (it should be the "solution" for those parameters).

Return values can be complex: when you need to return more than "one thing", you can just put the different parts in an object or array and return that. The caller can then unpack that to the individual parts again. For instance, if we would have also returned the count in the above code, we would have done:

function count(str, pattern, strInd = str.length, patternInd = pattern.length) {
    if (patternInd === 0) {
        return { count: 1, matches: [[]] };
    }

    if (strInd === 0) {
        return { count: 0, matches: [] };
    }

    if (str.charAt(strInd - 1) === pattern.charAt(patternInd - 1)) {
        const { count: count1, matches: matches1 }  = 
             count(str, pattern, strInd - 1, patternInd - 1);
        const { count: count2, matches: matches2 } = 
             count(str, pattern, strInd - 1, patternInd);
        // For the first case, add the current string index to the partial matches:
        return {
            count: count1 + count2,
            matches: [...matches1.map(indices => [...indices, strInd-1]), ...matches2]
        };
    } else {
        return count(str, pattern, strInd - 1, patternInd);
    }
}

It should always be possible to solve a recursion problem like this, but in case it proves too difficult, you can as an alternative, pass an extra object-variable (or array), to which the recursive call will add its results: it is like a collector that gradually grows to the final solution. The down side is that it goes against best practice to not let functions have side effects, and secondly, the caller of this function must already prepare an empty object and pass that to get the results.

Finally, don't be tempted to use global variables for such data collection. It would be a bit better if such "global" variables were actually local variables in a closure. But still, the other option is to be preferred.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Why are we adding the strInd - 1 to all childarrays of matches 1? – gaurav5430 Jun 19 '19 at 20:28
  • Also why only for matches1 and not for matches2 – gaurav5430 Jun 19 '19 at 20:29
  • because those childarrays all represent matches that should include that index. You decreased the pattern index in that recursive call, which indicates you "used" that character, and so it must be included in all matches. – trincot Jun 19 '19 at 20:30
  • not for matches2 because there the recursive call is made without considering that the string character matches the pattern character, even though they are the same character. Note how you don't decrease the pattern index in the second recursive call, meaning you still want that pattern character to be matched (with a string character at a different index) – trincot Jun 19 '19 at 20:32
  • Does this answer your question? – trincot Jun 20 '19 at 04:52
  • yeah thanks a lot, your comments answer my question about your approach. For the overall question, I am just waiting for more inputs on how to approach these kind of problems in general, where we have to keep track of things in the recursion path, possibly with some examples. – gaurav5430 Jun 20 '19 at 06:47
  • 1
    I added some more information, but in essence I believe this way of working (relying completely on the return value of the recursive function) will always work. – trincot Jun 20 '19 at 07:27