Count the number of times a string appears in a sequence

Question

I have a matrix X which comprises of some sequences I have from a Markov Chain. I have 5 states 1,2,3,4,5. So for example row 1 is a sequence and row 2 an separate independent sequence.

    4   4   4   4   4   5   3   0   0   0
    1   4   2   2   2   2   3   4   0   0
x=  4   4   1   2   1   3   1   0   0   0
    2   4   4   2   4   3   3   5   0   0
    4   4   5   4   2   1   2   4   3   5

I'd like to count the number of transitions between states 1..5. ie. 1to1,1to2, 1to3, 1to4, 1to5. 2to1 etc. Eg. 1to1 happens 0 times. However 4to4 happens 6 times. etc. We can ignore the zeros, they are an artefact from importing an excel file.

Eg this question but there, the sequence has been concatenated. Please let me know if you need further clarification.

Actually, you have 7 4to4 transitions (4 in the 1st, 1 in the 3rd, 1 in the 4th and 1 in the 5th)... — Eitan T, Sep 16 '12 at 00:05

Eitan T · Accepted Answer · 2012-09-23T11:55:21.180

3

Here's code that does what you want:

N = max(max(X));                                   %# Number of states
[P, Q] = meshgrid(1:N, 1:N);
Y = [X, zeros(size(X, 1), 1)]';                    %# Pad for concatenation
count_func = @(p, q)numel(strfind(Y(:)', [p, q])); %# Counts p->q transitions
P = reshape(arrayfun(count_func, P, Q), N, N)

Short explanation: all lines of X into one long vector Y (the padding is necessary so that there are no undesired transitions in adjacent lines). p and q hold all possible combinations for state transitions, and count_func counts the number of transitions in Y for a specific p and q. arrayfun invokes count_func for all possible combinations of p and q and produces matrix P accordingly.

For your example, this code yields matrix P:

P =
     0   2   1   1   0
     2   3   0   3   0
     1   1   1   2   1
     1   3   1   7   1
     0   0   2   2   0

where P(m, n) indicates the number of transitions from the m-th state to the n-th state.

EDIT: If you're interested in finding the 2-step transition matrix (that is, i-th state → j-th state → i-th state) as in your follow-up question, you just need to slightly alter count_func, like so:

count_func = @(p, q)numel(strfind(Y(:)', [p, q, p]));

This should yield:

P =

   0   1   0   0   0
   1   2   0   1   0
   1   0   0   0   0
   0   0   0   3   0
   0   0   0   1   0

edited Sep 23 '12 at 11:55

answered Sep 16 '12 at 00:06

Eitan T

32,660
14
72
109

I see what this code does but it complains about needing only one row for some reason. Did you use the above matrix as an input? Regards, – HCAI Sep 16 '12 at 08:14
I ran this code again and it works. Can you tell me which line triggers the error exactly? – Eitan T Sep 16 '12 at 12:26
It's the `findstr` function. Gives me the same with `strfind`. Error using findstr Input strings must have one row. Error in @(u)numel(findstr(Y(:),[p(u),q(u)])) – HCAI Sep 16 '12 at 12:40
Then try changing `Y(:)` to `Y(:)'`. Maybe that's the problem. – Eitan T Sep 16 '12 at 12:53
1

Perfect! Thank you very much for introducing me to `strfind`! I wonder why Y' was making a difference though. Any thoughts? – HCAI Sep 16 '12 at 12:58
1

I don't know. Although `strfind` can be applied to vectors, it is meant for strings, so it expects a row vector. In Octave it works on column vectors as well, didn't think it would be a problem. Anyway, I'm glad it works now. – Eitan T Sep 16 '12 at 13:01
@EitanT According to MATLAB R2012a Product Help on `findstr`: "Note: `findstr` will be removed in a future release. Use `strfind` instead." So it seems to me that using `strfind` is better way to do it for future compatibility. However, [in MATLAB online documention on findstr](http://www.mathworks.com/help/matlab/ref/findstr.html) the message is different: "Note: `findstr` is not recommended. Use `strfind` instead." So I wonder if they'll be removing `strfind` in a future release of MATLAB. – nrz Sep 17 '12 at 08:14
1

@nrz Yeah, it doesn't really make a difference to the solution. I changed it to `strfind` though so the solution won't use a deprecated function. – Eitan T Sep 17 '12 at 09:07
Any thoughts on how you could extend that gem to calculate the 2-step transition matrix? – HCAI Sep 20 '12 at 21:26
Can you give me an example of the desired result? – Eitan T Sep 20 '12 at 22:02
@EitanT So for example a transition from state 1 to 1 with one other transition inbetween. This could be 111, 121, 131, 141, 151 etc So it would still be a 5x5 but instead of comparing just 2 numbers side by-side we'd be looking at 3 together. Like this example: http://stackoverflow.com/questions/11072206/constructing-a-multi-order-markov-chain-transition-matrix-in-matlab but for our matrix here. Do you think it would be easy? – HCAI Sep 20 '12 at 22:17
1

Sure. This just requires a slight modification in `count_func`. Please see my updated answer. – Eitan T Sep 21 '12 at 21:04
1

Many thanks again, this is genius! I'm loving that `strfind` command! – HCAI Sep 23 '12 at 11:53

nrz · Answer 2 · 2012-09-17T08:25:19.193

An alternative solution:

%# Define the example data:
x = [
4 4 4 4 4 5 3 0 0 0
1 4 2 2 2 2 3 4 0 0
4 4 1 2 1 3 1 0 0 0
2 4 4 2 4 3 3 5 0 0
4 4 5 4 2 1 2 4 3 5
];

%# Number of different states.
N = max(max(x));

%# Original states.
OrigStateVector = repmat((1:N)', N, 1);

%# Destination states corresponding to OrigStateVector.
DestStateVector = reshape(repmat((1:N)', 1, N)', N^2, 1);

%# Pad rows of x with zeros and reshape it to a horizontal vector.
xVector = reshape([ x, zeros(size(x,1),1) ]', 1, numel(x)+size(x,1));

%# Compute the number of state transitions and store the result in ResultMatrix.
ResultMatrix = reshape(cellfun(@(z) numel(z), arrayfun(@(x,y) strfind(xVector, [x y]), OrigStateVector, DestStateVector, 'UniformOutput', false)), N, N)';

ResultMatrix =
 0     2     1     1     0
 2     3     0     3     0
 1     1     1     2     1
 1     3     1     7     1
 0     0     2     2     0

Count the number of times a string appears in a sequence

2 Answers2