0

I have the following row dataset that is containing disparate dates.

         Id       Date  Quantity
0  122222ST 2020-10-19      -1.0
1  122222ST 2020-10-22     75.0
2  122222ST 2020-10-22    1100.0
3    100128 2020-10-20       5.0

that I want to turn into a time serie for each Id such that every dates are here, if an Id has multiple for a date, they add up, and if there is none, it shows a 0. a minimal reproducible example would like this:

Date        122222ST 100128

2020-10-19  -1.0      0
2020-10-20   0        5
2020-10-21   0        0
2020-10-22   1175     0

I have been able to do this so far:

#read data
df = pd.read_csv('historical_data2.csv', sep=";")

#remove unwanted column
df = df.drop('NetAmount',axis=1)

#set date to datetime 
df['Date'] = pd.to_datetime(df['Date'])

#get unique Ids and sum quantities per date
df = df.groupby(['Id', 'Date']).agg({'Quantity':'sum'}).reset_index()

#adding all missing dates
df = df.groupby(pd.Grouper(key='Date', freq='D'))

From this point on I am stuck because I cannot find a way to set Id values as the horizontal index. I have seen this example Python Pandas: How to set Dataframe Column value as X-axis labels but it regards plotting which I am interested in. Any clue?

Murcielago
  • 905
  • 1
  • 8
  • 30

2 Answers2

1

You can do it with a pivot_table:

import pandas as pd
import numpy as np


data={"Id":["122222ST",'122222ST','122222ST','100128'],
        "Date":['2020-10-19','2020-10-22','2020-10-22','2020-10-20'],
        "Quantity": [-1,75,1100,50]}


df = pd.DataFrame(data)
df["Date"]=pd.to_datetime(df["Date"], format="%Y-%m-%d")
new_df= pd.pivot_table(df, values='Quantity', index=['Date'],
                    columns=['Id'], aggfunc=np.sum,fill_value=0)



#for missing days (e.g 2020-10-21)
df_Date=pd.date_range(start=new_df.index.min(), end=new_df.index.max(), freq='d')
new_df=new_df.reindex(df_Date,fill_value=0)


print(new_df)

result:

Id          100128  122222ST
2020-10-19       0        -1
2020-10-20      50         0
2020-10-21       0         0
2020-10-22       0      1175
Renaud
  • 2,709
  • 2
  • 9
  • 24
0

The operation to make the values of a column your new columns is called pivot. You have the additional complication that you want to sum quantities if they happen on the same day. This must be done before the pivot operation. Your solution

df = df.groupby(['Id', 'Date']).agg({'Quantity':'sum'}).reset_index()

should give the same result but

In [11]: df.groupby(['Id', 'Date']).sum().reset_index()                                                                                                                                                    
Out[11]: 
         Id        Date  Quantity
0    100128  2020-10-20       5.0
1  122222ST  2020-10-19      -1.0
2  122222ST  2020-10-22    1175.0

is a bit shorter

Then you can just apply pivot to the result and use fillna to set the not specified values to 0:

In [10]: df.groupby(['Id', 'Date']).sum().reset_index().pivot(index='Date', columns='Id').fillna(0)                                                                                                        
Out[10]: 
           Quantity         
Id           100128 122222ST
Date                        
2020-10-19      0.0     -1.0
2020-10-20      5.0      0.0
2020-10-22      0.0   1175.0
maow
  • 2,712
  • 1
  • 11
  • 25