0

My Dataset consists of 3535560 rows × 16 columns. However it contains three 'index-variables, I would like to use to reshape the dataset: 1830 days, 46 latitude values and 42 longitude values. The transformed dataset should therefore become (1830,46,42) but I have no idea how to do this. I saw that I could maybe use pd.pivot or pd.multiindex, but I am not able to find a solution.

In short data looks like:

      time          lat     lon         var 1       var 2       var 3       var 4      var 5
0     2021-01-01    60.125  -120.125    0.381828    0.917779    0.718022    0.064032    0.886050
1     2021-01-01    60.125  -119.875    0.221697    0.232657    0.298497    0.680900    0.124440
...
41    2021-01-01    60.125  -109.125    0.922149    0.708139    0.778329    0.267685    0.552542
42    2021-01-01    59.875  -120.125    0.569874    0.053829    0.740229    0.747286    0.194214
43    2021-01-01    59.875  -119.875    0.500091    0.185990    0.845510    0.877692    0.556584
....
1931  2021-01-01    48.875  -109.125    0.221697    0.232657    0.298497    0.680900    0.124440
1932  2021-01-02    60.125  -120.125    0.666589    0.849857    0.338648    0.552114    0.730678
1933  2021-01-02    60.125  -119.875    0.351144    0.467692    0.161488    0.530906    0.277561

As you can see in the table, after 42 longitude values, it takes a new latitude value and loops over all longitude values again. it does this for all latitude values before going to the next day (after 1932 rows (46x42)).

Would someone be able to help me fix this?

  • Something like: `df = df.pivot(index='time', columns=['lat', 'lon']).to_numpy()`? – David Erickson Apr 01 '21 at 12:19
  • Tried that one already. Unfortunately, I receive an error: `length mismatch: Expected 3535560 rows, received array of length 2` – John van de Ven Apr 01 '21 at 12:22
  • check out this answer: https://stackoverflow.com/questions/59546746/pivot-a-pandas-dataframe-into-a-3d-numpy-array Can you give a sample output of the 3D array from the pandas dataframe? – David Erickson Apr 01 '21 at 12:24
  • Thanks for the link! Although the methods explained in the video do seem to work for a single variable, they do not work when I try to incorporate all variables (except time,lat,lon) of my dataset – John van de Ven Apr 01 '21 at 12:59

0 Answers0