The problem i want to solve is actually not this simple, but this is kind of a toy game to help me solve the greater problem.
so i have a 5x5 matrix with values all equal to 0 :
structure = np.zeros(25).reshape(5, 5)
and the goal is for the agent to turn all values into 1, so i have:
goal_structure = np.ones(25).reshape(5, 5)
i created a class Player with 5 actions to go either left, right, up, down or flip (turn the value 0 to 1 or 1 to 0). For the reward, if the agent changes the value 0 into 1, it gets a +1 reward. if it turns a 1 into 0 in gets a negative reward (i tried many values from -1 to 0 or even -0.1). and if it just goes left, right, up or down, it gets a reward 0.
Because i want to feed the state to my neural net, i reshaped the state as below:
reshaped_structure = np.reshape(structure, (1, 25))
and then i add the normalized position of the agent to the end of this array (because i suppose the agent should have a sense of where it is):
reshaped_state = np.append(reshaped_structure, (np.float64(self.x/4), np.float64(self.y/4)))
state = reshaped_state
but i dont get any good results! it just like its random!i tried different reward functions, different optimizing algorithms, such as Exeperience replay, target net, Double DQN, duelling, but non of them seem to work! and i guess the problem is with defining the state. Can any one maybe helping me with defining a good state?
Thanks a lot!
ps: this is my step function:
class Player:
def __init__(self):
self.x = 0
self.y = 0
self.max_time_step = 50
self.time_step = 0
self.reward_list = []
self.sum_reward_list = []
self.sum_rewards = []
self.gather_positions = []
# self.dict = {}
self.action_space = spaces.Discrete(5)
self.observation_space = 27
def get_done(self, time_step):
if time_step == self.max_time_step:
done = True
else:
done = False
return done
def flip_pixel(self):
if structure[self.x][self.y] == 1:
structure[self.x][self.y] = 0.0
elif structure[self.x][self.y] == 0:
structure[self.x][self.y] = 1
def step(self, action, time_step):
reward = 0
if action == right:
if self.y < y_threshold:
self.y = self.y + 1
else:
self.y = y_threshold
if action == left:
if self.y > y_min:
self.y = self.y - 1
else:
self.y = y_min
if action == up:
if self.x > x_min:
self.x = self.x - 1
else:
self.x = x_min
if action == down:
if self.x < x_threshold:
self.x = self.x + 1
else:
self.x = x_threshold
if action == flip:
self.flip_pixel()
if structure[self.x][self.y] == 1:
reward = 1
else:
reward = -0.1
self.reward_list.append(reward)
done = self.get_done(time_step)
reshaped_structure = np.reshape(structure, (1, 25))
reshaped_state = np.append(reshaped_structure, (np.float64(self.x/4), np.float64(self.y/4)))
state = reshaped_state
return state, reward, done
def reset(self):
structure = np.zeros(25).reshape(5, 5)
reset_reshaped_structure = np.reshape(structure, (1, 25))
reset_reshaped_state = np.append(reset_reshaped_structure, (0, 0))
state = reset_reshaped_state
self.x = 0
self.y = 0
self.reward_list = []
self.gather_positions = []
# self.dict.clear()
return state