0

I have a timeindexed dataframe with two columns "USER_ID" and "Daily_BALANCE". The dataframe shows the daily balance of users for the month of April 2018 (so 30 observations for each user). I need to generate from that dataframe a 3D numpy array with shape(N (Nb of Users in the dataframe), 30 (daily date), 1 (Balance)) holding the corresponding values. Can anyone help me a code for doing this. for example if we have a 100 users in the dataframe I get an array with the shape(100,30,1) an equivelance of a 100 2D arrays with 30 rows and one column each.

enter image description here

Thanks

alya
  • 35
  • 5
  • When you have 100 users and 30 days, you will have 30 balance values for each day for each user, so the shape of your 3D array will be (100, 30, 30). Correct? – stahamtan Sep 23 '19 at 20:19
  • In a situation like this I would highly recommend to create a [dataframe with multi-level index](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) instead of a 3D numpy array. – Xukrao Sep 24 '19 at 00:31

1 Answers1

2

I am assuming you would want to keep only the daily balance values in your final 3D array where to retrieve the balance of the ith user of the jth day, you would do arr[i-1, j-1, 0].

If that is true, then you first need to sort your dataframe by USER_ID and DATE and then get the numpy array and finaly reshape it

n_users = len(df.USER_ID.unique())
n_days = len(df.DATE.unique())

arr = df.sort_values(by=['USER_ID', 'DATE'])['Daily_BALANCE'].values.reshape(n_users, n_days, 1)
stahamtan
  • 848
  • 6
  • 10
  • 1
    Exactly what I needed. I didn't know of the reshape function thanks a lot for the info :) – alya Sep 24 '19 at 08:37