1

I'm a newbie and I've tried looking at several posts and can't seem to get this to work... I'm the problem, I'm sure.

Trying to combine and condense two data sets, one has the date and weight of fruit purchased and the other has historical daily prices. I'm trying to condense these together to a weekly basis.

The raw data I'm trying to create Weights from looks like this:

Date        Product       Weight
1-1-12      Strawberry     15
1-2-12      Bananna        56
1-2-12      Apple          98
1-5-12      Strawberry    115
1-5-12      Bananna       516
1-5-12      Apple         981

Here's the code I've tried to create a dataframe but returns a series:

df_GrossWeight = pd.read_csv('mydata.csv', encoding='utf-8')
df_GrossWeight_Indexed = df_GrossWeight.set_index(pd.DatetimeIndex(df_GrossWeight['Date']))
grouper = df_GrossWeight_Indexed.groupby([pd.TimeGrouper('W'),'Product'])
Weights = grouper['Weight'].sum()

I want to merge it with the series I created Prices that lists the weekly prices:

(datetimeindex)     Product       Price
2012-01-1           Strawberry    2.10
2012-01-1           Banana        0.55
2012-01-1           Apple         1.25

Here's the code to create Prices that I used:

df_Price = pd.read_csv('Price_Hist.csv')
df_Indexed = df_Price.set_index(pd.DatetimeIndex(df_Price['Date']), drop = True)
df_Price_Indexed = df_Indexed['Price']
Prices = df_Price_Indexed.resample('W').mean()

The end dataframe I'm trying to make will have the weekly prices and weekly sums of how much we bought. It will look like this:

  (datetimeindex)       Product       Price     Weight
    2012-01-1           Strawberry    2.10       130
    2012-01-1           Banana        0.55       572
    2012-01-1           Apple         1.25      1079

I'm getting the feeling this can be done a lot simpler than how I'm trying, so any help is much appreciated.

Thank you in advance, Me

Programming_Learner_DK
  • 1,509
  • 4
  • 23
  • 49

2 Answers2

2

One way to do this is 'round' all the dates to the nearest weekday. Once you have this 'rounded' date. You can join the two dataframes.

df['Date'] = pd.to_datetime(df['Date'])
df2['(datetimeindex)'] = pd.to_datetime(df2['(datetimeindex)'])

round to nearest sunday

df2['Week_Sunday'] = df2['(datetimeindex)'] + pd.tseries.offsets.Week(weekday=6)
df['Week_Sunday'] = df.Date + pd.tseries.offsets.Week(weekday=6)

now merge the data

df_all = pd.merge(df2, df, on = ['Week_Sunday', 'Product'])
print(df_all)

Output

  (datetimeindex)     Product  Price Week_Sunday       Date  Weight
0      2012-01-01  Strawberry   2.10  2012-01-08 2012-01-01      15
1      2012-01-01  Strawberry   2.10  2012-01-08 2012-01-05     115
2      2012-01-01      Banana   0.55  2012-01-08 2012-01-02      56
3      2012-01-01      Banana   0.55  2012-01-08 2012-01-05     516
4      2012-01-01       Apple   1.25  2012-01-08 2012-01-02      98
5      2012-01-01       Apple   1.25  2012-01-08 2012-01-05     981

Groupby and sum

df_all.groupby(['(datetimeindex)', 'Product', 'Price'], as_index=False)['Weight'].sum()

 (datetimeindex)     Product  Price  Weight
0      2012-01-01       Apple   1.25    1079
1      2012-01-01      Banana   0.55     572
2      2012-01-01  Strawberry   2.10     130
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
1

Make sure your Date column are dates

Weights.Date = pd.to_datetime(Weights.Date)

Also make sure to fix the banana typo.

We can use pd.merge_asof to find the most recent date that is less than or equal to the target date.

pd.merge_asof(
    Weights, Prices, left_on=['Date'], right_on=['(datetimeindex)'], by='Product'
).groupby(
    ['(datetimeindex)', 'Product']
).agg(dict(Weight='sum', Price='mean')).reset_index()

  (datetimeindex)     Product  Price  Weight
0      2012-01-01       Apple   1.25    1079
1      2012-01-01      Banana   0.55     572
2      2012-01-01  Strawberry   2.10     130
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • If the person that down voted is still looking at this post, I'd like to address your concerns and possibly convince you to remove the down vote. If you could, please let me know what is wrong with this post. Was it not useful? – piRSquared Mar 24 '17 at 16:24