0

I have set-up a tf_agent with a normal neural network as a q-net to learn trajectories which works fine. However, I'd now like to try a QRnnNetwork and train/learn from sequences of events but can't get it to work.

The action and observation spec in my custom env look as follows.

        self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=num_actions-1, name='action'
        )
        self._observation_spec = array_spec.BoundedArraySpec(
            shape=(num_features,), dtype=np.int32, name='observation'
        )

Next I've now setup the qnet

lstm_neurons = 50
train_sequence_length = 5

input_fc_layer_params=(40,)
output_fc_layer_params=(40,)

rnn_network = QRnnNetwork(
    train_env.observation_spec(), 
    train_env.action_spec(), 
    lstm_size=(lstm_neurons,),
    input_fc_layer_params=input_fc_layer_params,
    output_fc_layer_params=output_fc_layer_params,
)

agent = dqn_agent.DqnAgent(
        time_step_spec=train_env.time_step_spec(),
        action_spec=train_env.action_spec(),
        q_network=rnn_network,
        optimizer=optimizer,
        target_update_period=target_update_period,
        td_errors_loss_fn=tf.keras.losses.Huber(reduction="none"),
        gamma=discount,
        epsilon_greedy=lambda: epsilon_fn(train_step_counter),
        train_step_counter=train_step_counter
    )

Next to generate and use training data I understood to pass train_sequence_length + 1 for num_steps, which was previously 2 and now more due to increased sequence size.

# replay buffer and driver for training
replay_buffer = TFUniformReplayBuffer(
    agent.collect_data_spec,
    batch_size=replay_buffer_batch_size,
    max_length=replay_buffer_max_size
)

replay_buffer_observer = replay_buffer.add_batch
train_metrics = [tf_metrics.AverageReturnMetric()]

# Create q-policy to plot the learned q-values
qpolicy = QPolicy(train_env.time_step_spec(), train_env.action_spec(), rnn_network)

collect_driver = dynamic_step_driver.DynamicStepDriver(
    train_env,
    agent.collect_policy,
    observers=[replay_buffer_observer] + train_metrics,
    num_steps=collect_steps)

print('Initial data generation and setting up the dataset for training')
initial_collect_policy = random_tf_policy.RandomTFPolicy(train_env.time_step_spec(), train_env.action_spec())

init_driver = dynamic_step_driver.DynamicStepDriver(
    train_env,
    initial_collect_policy,
    observers=[replay_buffer.add_batch, ShowProgress(replay_buffer_max_size)],
    num_steps=replay_buffer_max_size)

final_time_step, final_policy_state = init_driver.run()

dataset = replay_buffer.as_dataset(sample_batch_size=16,
                                   num_steps=(train_sequence_length + 1), #2 
                                   num_parallel_calls=4).prefetch(4)

collect_driver.run = common.function(collect_driver.run)
agent.train = common.function(agent.train)

Once I start training using the below commands, I receive the error "ValueError: Dimensions must be equal, but are 5 and 16 for '{{node gradient_tape/loss/mul_4/Mul}} = Mul[T=DT_FLOAT](loss/Cast, gradient_tape/loss/Tile_1)' with input shapes: [16,5], [16,16]." Where 16 is the sample_batch_size and 5 the sequence length I'm trying to pass. Please advice what I'm missing here.

time_step = None
policy_state = agent.collect_policy.get_initial_state(train_env.batch_size)
iterator = iter(dataset)
time_step, policy_state = collect_driver.run(time_step, policy_state)
trajectories, buffer_info = next(iterator)
train_loss = agent.train(trajectories)

The observations are now in shape (16, 6, 9), which seems to be correct (sample_batch_size, train_seq_length+1, nr_features). The code runs with num_steps=2.

vincentp
  • 31
  • 6

0 Answers0