(All references to code can be found at https://github.com/EXJUSTICE/Doom_DQN_GC/blob/master/TF2_Doom_GC_CNN.ipynb)
Background
I apologize for the length of this post, I wanted it to be as clear as possible.
I've been adapting some Atari OpenAI gym code of mine to work on the VizDoom package to build a DQN model in Doom, using 480 x640 input frame images. While running some initial demos (cell marked 8) with a completely random policy, I noticed that the model would always skip the first episode. If this happens, calling a state would return a None object. As a result, I adapted my original code to check if an episode was done, before performing any training (cell marked 24)
Stacking
One of the general approaches in building agents for reinforcement learning is the use of stacked frames. The idea here is to track motion by summing up the element-wise maxima of a few consecutive frames. This is shown below for reference:
stacked_frames = deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)
def stack_frames(stacked_frames, state, is_new_episode):
# Preprocess frame
frame = preprocess_observation(state)
if is_new_episode:
# Clear our stacked_frames
stacked_frames = deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)
# Because we're in a new episode, copy the same frame 4x, apply elementwise maxima
stacked_frames.append(frame)
stacked_frames.append(frame)
stacked_frames.append(frame)
stacked_frames.append(frame)
# Stack the frames
stacked_state = np.stack(stacked_frames, axis=2)
else:
#Since deque append adds t right, we can fetch rightmost element
#maxframe=np.maximum(stacked_frames[-1],frame)
# Append frame to deque, automatically removes the oldest frame
stacked_frames.append(frame)
# Build the stacked state (first dimension specifies different frames)
stacked_state = np.stack(stacked_frames, axis=2)
return stacked_state, stacked_frames
Problem
All of my attempts to run my agent resulted in the following error:
/usr/local/lib/python3.6/dist-packages/skimage/transform/_warps.py in warp(image, inverse_map, map_args, output_shape, order, mode, cval, clip, preserve_range)
805
806 if image.size == 0:
--> 807 raise ValueError("Cannot warp empty image with dimensions", image.shape)
808
809 image = convert_to_float(image, preserve_range)
ValueError: ('Cannot warp empty image with dimensions', (0, 24))
Upon close inspection, this error is due to the preprocessing reshaping function, which invokes scikit-image's transform in order to convert a cropped greyscale frame to the input shape (84,84). In my original OpenAI code, I called on the .reshape() function instead of .transform, but I this gave me errors with the Vizdoom frames, hence I stuck with transform.
def preprocess_observation(frame):
# Crop and resize the image into a square, as we don't need the excess information
cropped = frame[60:-60,30:-30]
normalized = cropped/255.0
img_gray = rgb2gray(normalized)
preprocessed_frame = transform.resize(img_gray, [84,84])
return preprocessed_frame
As it seemed that the method was trying to modify an empty image (I believe), I naturally inspected the done section of the agent.
next_obs=np.zeros((84,84), dtype=np.int)
next_obs,stacked_frames= stack_frames(stacked_frames,next_obs,False)
exp_buffer.append([obs, action, next_obs, reward, done])
step = max_steps
history.append(episodic_reward)
print('Episode: {}'.format(len(history)),
'Total reward: {}'.format(episodic_reward))
game.new_episode()
I believe it was the stacking function that gave me the problem. I hence did some experimentation to confirm this.
Solutions Attempted
Shifting the stacking function and attempting invoke an observation from the environment itself results in a Nonetype error, which is understandable given that the environment is dead.
Increasing the dequeue memory buffer length seems to allow for slightly longer training time (from 5 to 10 episodes)
3.If one removes all stacking from the done section, some training does take place.
step = max_steps
history.append(episodic_reward)
print('Episode: {}'.format(len(history)),
'Total reward: {}'.format(episodic_reward))
game.new_episode()
This results in around 10 episodes of training, before we observe the series of errors in cell 24 (shortened below).
ValueError Traceback (most recent call last)
ValueError: setting an array element with a sequence.
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-24-fa4adb5665e3> in <module>()
158
159 # merge all summaries and write to the file
--> 160 mrg_summary = merge_summary.eval(feed_dict={X:o_obs, y:np.expand_dims(y_batch, axis=-1), X_action:o_act, in_training_mode:False})
161 file_writer.add_summary(mrg_summary, global_step)
162
4 frames
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
ValueError: setting an array element with a sequence.
Doing some searching on StackOverflow leads me to believe that the input observation array being fed to the model is somehow out of shape through the sampling process. This leads me to believe that it is the lack of stacking that is the problem.
Any advice is welcome to solve this headache. Thank you for your time!