0

Since I’m a beginner in ML, this question or the design overall may sound silly, sorry about that. I’m open to any suggestions.

I have a simple network with three linear layers one of which is output layer.

self.fc1 = nn.Linear(in_features=2, out_features=12)
self.fc2 = nn.Linear(in_features=12, out_features=16)
self.out = nn.Linear(in_features=16, out_features=4)

My states are consisting of two values, coordinate x and y. That’s why input layer has two features.

In main.py I’m sampling and extracting memories in ReplayMemory class and pass them to get_current function:

    experiences = memory.sample(batch_size)
    states, actions, rewards, next_states = qvalues.extract_tensors(experiences)

    current_q_values = qvalues.QValues.get_current(policy_net, states, actions)

Since a single state is consisting of two values, length of the states tensor is batchsize x 2 while length of the actions is batchsize. (Maybe that’s the problem?)

When I pass “states” to my network in get_current function to obtain predicted q-values for the state, I get this error:

size mismatch, m1: [1x16], m2: [2x12]

It looks like it is trying to grab the states tensor as if it is a single state tensor. I don’t want that. In the tutorials that I follow, they pass the states tensor which is a stack of multiple states, and there is no problem. What am I doing wrong? :)

This is how I store an experience:

memory.push(dqn.Experience(state, action, next_state, reward))

This is my extract tensors function:

def extract_tensors(experiences):
    # Convert batch of Experiences to Experience of batches
    batch = dqn.Experience(*zip(*experiences))

    state_batch = torch.cat(tuple(d[0] for d in experiences))
    action_batch = torch.cat(tuple(d[1] for d in experiences))
    reward_batch = torch.cat(tuple(d[2] for d in experiences))
    nextState_batch = torch.cat(tuple(d[3] for d in experiences))

    print(action_batch)

    return (state_batch,action_batch,reward_batch,nextState_batch)

Tutorial that I follow is this project's tutorial.

https://github.com/nevenp/dqn_flappy_bird/blob/master/dqn.py

Look between 148th and 169th lines. And especially 169th line where it passes the states batch to the network.

K.Yazoglu
  • 207
  • 3
  • 13
  • could you post your complete code? Also, link to the tutorial you are following might help? – rawwar Nov 27 '19 at 14:39
  • one thing i would suggest you to do is put a debug point. Keep track of the variables you want and check if their shape is different than what you expected. – rawwar Nov 27 '19 at 14:40
  • @InAFlash Full project of the tutorial is added. You can check the lines I mentioned. By the way, I'm a newbie in this area so I don't know what to expect really. For the start, I'm just following tutorials and only changing the environment (the game). – K.Yazoglu Nov 27 '19 at 14:45
  • Does this answer your question? [RuntimeError: size mismatch m1: \[a x b\], m2: \[c x d\]](https://stackoverflow.com/questions/53828518/runtimeerror-size-mismatch-m1-a-x-b-m2-c-x-d) – prosti Nov 27 '19 at 19:24
  • @prosti No it doesn't. My problem is about batch pass. – K.Yazoglu Nov 28 '19 at 06:11

1 Answers1

0

SOLVED. It turned out that I didn't know how to properly create 2d tensor. 2D Tensor must be like this:

states = torch.tensor([[1, 1], [2,2]], dtype=torch.float)

K.Yazoglu
  • 207
  • 3
  • 13