Last week I've read a paper suggesting MDP as an alternative solution for recommender systems, The core of that paper was representation of recommendation process in terms of MDP, i.e. states, actions, transition probabilities, reward function and so on.
If we assume for simplicity a single-user system, then states look like k-tuples (x1, x2, .. , xk)
where last element xk represents the very last item that was purchased by the user.
For example, suppose our current state is (x1, x2, x3)
which means, the user purchased x1, then x2, then x3, in chronological order. Now if he purchases x4, the new state is going to be (x2, x3, x4)
.
Now, what the paper suggests, is that, these state transitions are triggered by actions, where action is "recommending an item x_i to the user". but the problem is that such an action may lead to more than one state.
For example if our current state is (x1, x2, x3)
, and action is "recommend x4" to the user, then the possible outcome might be one out of two:
the user accepts the recommendation of x4, and new state will be (x2, x3, x4)
the user ignores the recommendation of x4 (i.e. buys something else) and new state will be any state (x2, x3, xi)
where xi != x4
My question is, does MDP actually support same action triggering two or more different states ?
UPDATE. I think the actions should be formulated as "gets recommendation of item x_i and accepts it" and "gets recommendation of item x_i and rejects it" rather than simply "gets recommendation of item x_i"