2

I have a dataframe with values as below-

               Amount  Product  DocDate
0              1099.0   1100 2018-01-02
1              1234.0   1100 2018-01-04
2              1000.0   1100 2018-01-06
3              8000.0   1100 2018-01-28
4              3000.0   1100 2018-02-09
5              4500.0   1100 2018-02-20

I need to calculate the sum of amount field calculated on end of each fortnight.

Example:

Product Amount FortNight
1100    3333.0  Jan 1st Fortnight (this could be date format too!!)
1100    3000.0  Feb 2nd Fortnight
1100    4500.0  Feb 1st Fortnight

It would be great if it did not involve loops( since I have loops for product too) I tried separating (grouping) both week and month and calculating, but couldn't get expected results since its a column in dataframe.

Thanks in advance. Appreciate help.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Would this help: https://stackoverflow.com/questions/3430369/group-together-arbitrary-date-objects-that-are-within-a-time-range-of-each-other – zoran119 Jun 29 '18 at 12:03

2 Answers2

2

First of all, I am a little worried since you said you were using loops with pandas... You should NEVER use loops with pandas they are not made for it and are extremely slow and ineffective. There are some really rare cases where you cannot avoid it, but even then there are options to optimize it better.

To answer your question, you first need to convert the DocDate to datetime format:

from datetime import datetime
df.DocDate = df.DocDate.apply(lambda d: datetime.strptime(d, %Y-%m-%d))

You can then use the datetimeIndex.resample function which works exactly like groupby but allows you to group data by time constraint :

df = df.set_index('DocDate').resample('2W').Amount.sum()

The resample('2W') here means group by 2 weeks periods.

Zuma
  • 806
  • 1
  • 7
  • 10
  • Sorry, maybe my question framing was wrong.. I would be looping through a product list [1100,1200,1300.. etc].. for each of the product I would be creating a dataframe containing the dataset pertaining to that product and need to perform the summation.. – Sparsha Devapalli Jun 29 '18 at 13:32
1

Need:

#if necessary convert column to datetime
df['DocDate'] = pd.to_datetime(df['DocDate'])
#generate Fortnight https://stackoverflow.com/a/34428879
s =  np.where(df['DocDate'].dt.day < 15, '1st Fortnight', '2nd Fortnight')
#create new column
df['FortNight'] = df['DocDate'].dt.strftime('%b ') + s
#aggregate sum
df = df.groupby(['Product','FortNight'], as_index=False, sort=False)['Amount'].sum()
print (df)
   Product          FortNight  Amount
0     1100  Jan 1st Fortnight  3333.0
1     1100  Jan 2nd Fortnight  8000.0
2     1100  Feb 1st Fortnight  3000.0
3     1100  Feb 2nd Fortnight  4500.0

If need datetimes in output:

s =  np.where(df['DocDate'].dt.day < 15, '-01', '-15')
df['FortNight'] = pd.to_datetime(df['DocDate'].dt.strftime('%Y-%m') + s)

df = df.groupby(['Product','FortNight'], as_index=False, sort=False)['Amount'].sum()
print (df)
   Product  FortNight  Amount
0     1100 2018-01-01  3333.0
1     1100 2018-01-15  8000.0
2     1100 2018-02-01  3000.0
3     1100 2018-02-15  4500.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252