Questions tagged [markov-decision-process]

51 questions
0
votes
1 answer

How to define an MDP as a python function?

I’m interested in defining a Markov Decision Process as a python function. It would need to interface with PyTorch API for reinforcement learning, however that constraint shapes the function’s form, inputs and outputs. For context, my problem…
0
votes
0 answers

List of Map of Tuples for MDP

I am trying to implement an MDP for the first time. Each state is a tuple of four variables. I want to implement a transition table, that maps each state and action to the next state. transition_model = [tuple(x) for x in…
Schnee
  • 1
  • 1
0
votes
0 answers

markov-chain: understand estimation of missing transition probabilities

The following markov-chain is given with missing transition probabilities p and q is given: It is also known how often the different final states occur. A = ['E', 'D', 'F', 'D', 'D', 'F', 'E', 'D', 'F', 'F', 'D', 'E'] The goal is to estimate p and…
0
votes
0 answers

how did alphago visualize markov decision process?

Does anyone know how to visualize the markov decision process and monte carlo tree search like in this video? https://www.youtube.com/watch?v=MgowR4pq3e8&ab_channel=ArxivInsights At 7:12 in the above video, each step span is visualized with a new…
0
votes
0 answers

How can we approximate infinite horizon MDP with finite horizon MDP in the context of reinforcement learning?

For a given value of "discount factor" (and reward values' range) in fixed finite horizon markov decision process (MDP), upto how many episodes we have to extend this MDP so that we can approximate the corresponding infinite horizon MDP? I am…
0
votes
1 answer

Shaping theorem for MDPs

I need help with understanding the shaping theorem for MDPs. Here's the relevant paper: https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/NgHaradaRussell-shaping-ICML1999.pdf it basically says that a markov decision process that has some…
0
votes
1 answer

How should I code the Gambler's Problem with Q-learning (without any reinforcement learning packages)?

I would like to solve the Gambler's problem as an MDP (Markov Decision Process). Gambler's problem: A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads, he wins as many dollars as he has…
0
votes
1 answer

MDP Policy Iteration example calculations

I am new to RL and following lectures from UWaterloo. In the lecture 3a on Policy Iteration, professor gave an example of MDP involving a company that needs to make decision between Advertise(A) or Save(S) decisions in states - Poor Unknown(PU),…
0
votes
0 answers

Error in if (temp < vmin) { : argument is of length zero

I am trying to code the Markov Chain approximation for some control problems. But I have the following bug in R and I checked similar question in Stackoverflow and still have no idea how to solve it. Any help will be greatly appreciated. The bug…
0
votes
0 answers

Simplest way to define an MDP in OpenAI Gym?

I'm looking for an example-based answer, whether that's code directly in the answer or a link to a tutorial, but regardless more than a text-only answer. I'm curious- how would one define an arbitrary Markov Decision Process in OpenAI Gym for…
0
votes
1 answer

How to build Markov Decision Processes model in Python for string data?

I have a dataset containing data which are represented via URI. I'd like to model the data that can predict the predecessor and successor of a data sample from my sequential data. Dataset looks like this: e.g. given "HTTP://example.com/112", the…
Nili
  • 91
  • 8
0
votes
2 answers

Why does initialising the variable inside or outside of the loop change the code behaviour?

I am implementing policy iteration in python for the gridworld environment as a part of my learning. I have written the following code: ### POLICY ITERATION ### def policy_iter(grid, policy): ''' Perform policy iteration to find the best…
0
votes
2 answers

How to ignore certain parts of a line in text file in Python?

I'm attempting to extract the numerical information from an input.txt file I have below. size : 5 4 walls : 2 2 , 2 3 reward : -0.04 transition_probabilities : 0.8 0.1 0.1 0 discount_rate : 0.85 epsilon : 0.001 As you…
0
votes
2 answers

Problems with coding Markov Decision Process

I am trying to code Markov-Decision Process (MDP) and I face with some problem. Could you please check my code and find why it isn't works I have tried to do make it with some small data and it works and give me necessary results, which I feel is…
David
  • 35
  • 7
0
votes
1 answer

Interrogating the results of the Markov simulation - Help and feedback highly appreciated

I have built a Markov chain with which I can simulate the daily routine of people (activity patterns). Each simulation day is divided into 144-time steps and the person can carry out one of fourteen activities. Those are: Away - work (1) Away -…