2
import numpy as np 
import gym
import random
import time
from IPython.display import clear_output

env = gym.make("FrozenLake-v0")

action_space_size = env.action_space.n
state_space_size = env.observation_space.n

q_table = np.zeros((state_space_size, action_space_size))

num_episodes = 10000
max_steps_per_episode = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01

reward_all_episodes = []

#Q-learning algorithm
for episode in range(num_episodes):
    state = env.reset()

    done = False
    reward_current_episode = 0

    for step in range(max_steps_per_episode):

        exploration_rate_threshold = random.uniform(0, 1)
        if exploration_rate_threshold > exploration_rate: #Exploit
            action = np.argmax(q_table[state, :])
        else:
            action = env.action_space.sample() #Explore

        new_state, reward, done, info = env.step(action)

        q_table[state, action] = (1 - learning_rate) * q_table[state, action] + \
            learning_rate * (reward + discount_rate * np.max(q_table[new_state]))

        state = new_state

        reward_current_episode += reward

        if done == True:
            break

    exploration_rate = min_exploration_rate + \
        (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate * episode)

    reward_all_episodes.append(reward_current_episode)

reward_per_thousand_episodes = np.split(np.array(reward_all_episodes), num_episodes/1000)
count = 1000
print("Average Reward per thousand episode \n")
for r in reward_per_thousand_episodes:
    print(count, ":", str(sum(r/1000)))
    count += 1000

print("\n ***************Q-table****************\n\n")
print(q_table)

I am new in AI and I need a bit of help. I have completed the FrozenLake exercise with MVP / Q-learning. Someone told me I can approximate the q-function using a deep neural network. It explains it is called deep Q-learning. How can I improve that code using deep Q-learning and pytorch? In other words, how can I approximate the q-function in using NN here?

J.Doe
  • 55
  • 6

1 Answers1

1

This is a slightly broad question, but here's a breakdown.

Firstly NNs are just function approximators. Give them some input and output and they will find f(input) = output Only, if such a function exists and is differentiable based on the loss/cost

So the Q function is Q(state,action) = futureReward for that action taken in that state

Alternatively, we can change the Q function to take in just the current state, and output an array of the estimated future rewards for each action. Example:

[7,5,1,8] for action a,b,c,d

So now the Q function => Q(state) = futureRewardMatrix[action*]

Now all you need is a neural network that takes in the current state and outputs the rewards for each action.(Only works for discrete actions [a,b,c,d..])

How to train the network.

  • Collect a training batch,by collecting states,actions,rewards, nextState
  • To get actions you use nn.predict(state), factoring epsilon to choose random actions

Training:

x_train = state
y_train[action] = reward + self.gamma * (np.amax(nn.predict(nextState))

next we train on a relatively large batch of x_trains and y_trains

nn.train_on_batch(x_train_batch,y_train_batch)

Then repeat the process, of collecting batches for every step of the environment.

I recommend you check out medium and towardsdatascience DQN articles and their respective Github repos to get full code implementation

Bob Kimani
  • 1,114
  • 1
  • 20
  • 38
  • Are you up to show what do I have to modify in my code with explanation? I want to change my code to use deep Q-learning instead of standard MVP/Q-learning. I am very close to understand the differences between Q-learning and deep Q-learning. I have tried to implement Breakout-v0 and CartPole-v0, but it is too hard at the moment. I know that my frozenlake exercise is not worth to use deep Q-learning because the Q-table is way to small, but I think it will just help me to understand better because it is much simpler than CartPole-v0 in my opinion – J.Doe Mar 23 '20 at 12:16
  • Sadly I am not that familiar with PyTorch, but checkout this Keras-TensorFlow implementation https://github.com/tokb23/dqn/blob/master/dqn.py – Bob Kimani Mar 23 '20 at 12:28