2

I have a dataframe of time series data like so

df = pd.DataFrame({'TimeStep': [1, 2, 3, 1, 2, 3],
                   'Feature1': [100, 250, 300, 400, 100, 50],
                   'Feature2' : [2, 5, 100, 10, 42, 17]})

   TimeStep |Feature1   |Feature2
    |1      |100    |2
    |2      |250    |5
    |3      |300    |100
    |1      |400    |10
    |2      |100    |42
    |3      |50     |17

Now I would like to feed these to a simpleRNN layer in keras for example above Batch Size would be 2, timesteps = 3 and input_dim = 2

I tried df.to_numpy().reshape((2, 3, 2)) (with the actual dimensions of the real df of course) And that shape didn't work.

I'm grateful for any pointers you could give me. A while back I did something similar with a pure numpy array, but where I didn't specify the input_dim and that worked.

Thanks in advance!

Olli
  • 906
  • 10
  • 25

1 Answers1

2

You are close! If you reshape the dataframe excluding the TimeStep column (via iloc[:, 1:]), it should do:

>>> df.iloc[:, 1:].to_numpy().reshape(2, 3, 2)
array([[[100,   2],
        [250,   5],
        [300, 100]],

       [[400,  10],
        [100,  42],
        [ 50,  17]]], dtype=int64)

which has the (batch_size, seq_len, num_features) shape.

Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
  • Aydin Thank you so much. Would you be willing to explain a confused me the logic behind this? Isn't it now 4 dimensions? I'm sorry, I'd just like to understand what I'm doing :-) – Olli Apr 14 '21 at 12:17
  • 1
    Oh, I got it, I think! I couldn't edit above, but I see it now! Sorry for the mess :-) – Olli Apr 14 '21 at 12:23
  • 1
    @Olli Glad to be of help! You already mentioned that you know the batch_size, number of time steps per sample and lastly the number of features per sample. Then the information in the columns `Feature 1` and `Feature 2` is enough for us to construct the desired data. You can think of it like this: We are given 12 numbers, but these numbers have some hierarchy in them. For example, at the highest level, we know that they are in 2 groups (batch size). So we have 6 + 6 samples, actually. Then in a lower level, we know that each 3 consecutive sample is temporally-dependent (time steps).... [contd] – Mustafa Aydın Apr 14 '21 at 12:26
  • 1
    [contd] ...so we have (3 + 3) + (3 + 3). But now the remaining info is also automatically revealed: we *must* have 2 features per step to satisfy 6 = 3 *2. Overall, we separated these 12 guys into a (2, 3, 2)-shaped hierarchy. – Mustafa Aydın Apr 14 '21 at 12:26
  • 1
    Aydin Thank you very much for this detailed explanation, it's been of great help to me understanding it! – Olli Apr 14 '21 at 14:05