4

I'm trying to upsample weekly data to daily data, however, I'm having difficulty upsampling the last edge. How can I go about this?

import pandas as pd
import datetime
df = pd.DataFrame({
    'wk start': ['2018-08-12', '2018-08-12', '2018-08-19'], 
    'car': ['tesla model 3', 'tesla model x', 'tesla model 3'],
    'sales': [38000, 98000, 40000]})
df['wk start'] = df['wk start'].apply(
    lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df.set_index('wk start').groupby('car').resample('D').pad()

This returns:

                             car            sales
car             wk start        
tesla model 3   2018-08-12  tesla model 3   38000
                2018-08-13  tesla model 3   38000
                2018-08-14  tesla model 3   38000
                2018-08-15  tesla model 3   38000
                2018-08-16  tesla model 3   38000
                2018-08-17  tesla model 3   38000
                2018-08-18  tesla model 3   38000
                2018-08-19  tesla model 3   40000

tesla model x   2018-08-12  tesla model x   98000

My desired output is:

                             car            sales
car             wk start        
tesla model 3   2018-08-12  tesla model 3   38000
                2018-08-13  tesla model 3   38000
                2018-08-14  tesla model 3   38000
                2018-08-15  tesla model 3   38000
                2018-08-16  tesla model 3   38000
                2018-08-17  tesla model 3   38000
                2018-08-18  tesla model 3   38000
                2018-08-19  tesla model 3   40000
                2018-08-20  tesla model 3   40000
                2018-08-21  tesla model 3   40000
                2018-08-22  tesla model 3   40000
                2018-08-23  tesla model 3   40000
                2018-08-24  tesla model 3   40000
                2018-08-25  tesla model 3   40000
tesla model x   2018-08-12  tesla model x   98000
                2018-08-13  tesla model x   98000
                2018-08-14  tesla model x   98000
                2018-08-15  tesla model x   98000
                2018-08-16  tesla model x   98000
                2018-08-17  tesla model x   98000
                2018-08-18  tesla model x   98000

I looked at this, but they're using periods and I'm looking at datetimes.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Mariah Akinbi
  • 386
  • 1
  • 5
  • 19
  • it's impossible to generate a wk start with those values given the input – Yuca Aug 10 '18 at 16:50
  • @Yuca why is it impossible? – Mariah Akinbi Aug 10 '18 at 16:56
  • the dates are bounded by 2018-08-12 and 2018-08-19, why do you expect your output to have data with dates after 2018-08-20? – Yuca Aug 10 '18 at 16:58
  • @Yuca I would think that whatever it does for 8/12, it can also do for 8/19. I think I may have to reindex instead of resample – Mariah Akinbi Aug 10 '18 at 17:00
  • that is not what I'm saying. Check your desired output, you have dates that are beyond the scope of your inputs. You need something to generate the future dates beyond 2018-08-19 – Yuca Aug 10 '18 at 17:03

3 Answers3

2

Assign a column for the end of each of your weeks and stack, before using your previous groupby attempt:

(df.assign(end=df['wk start'].add(pd.DateOffset(6))).set_index(
    ['car', 'sales']).stack()
    .rename('wk start').reset_index([0, 1])
    .set_index('wk start').groupby('car')
    .resample('D').pad()
)

Output:

                                    car  sales
car           wk start
tesla model 3 2018-08-12  tesla model 3  38000
              2018-08-13  tesla model 3  38000
              2018-08-14  tesla model 3  38000
              2018-08-15  tesla model 3  38000
              2018-08-16  tesla model 3  38000
              2018-08-17  tesla model 3  38000
              2018-08-18  tesla model 3  38000
              2018-08-19  tesla model 3  40000
              2018-08-20  tesla model 3  40000
              2018-08-21  tesla model 3  40000
              2018-08-22  tesla model 3  40000
              2018-08-23  tesla model 3  40000
              2018-08-24  tesla model 3  40000
              2018-08-25  tesla model 3  40000
tesla model x 2018-08-12  tesla model x  98000
              2018-08-13  tesla model x  98000
              2018-08-14  tesla model x  98000
              2018-08-15  tesla model x  98000
              2018-08-16  tesla model x  98000
              2018-08-17  tesla model x  98000
              2018-08-18  tesla model x  98000
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • thank you for the response! This works on the data set given, but is a bit tricky to debug on my actual dataset (which has many more columns) and it is a bit slower than the accepted answer. I do appreciate the one-liner, though! @user3483203 – Mariah Akinbi Aug 10 '18 at 19:24
  • one can even use append.. there are several options to this I believe – Onyambu Aug 10 '18 at 21:14
2

Yes, you are right, last edge data are excluded. Solution is add them to input DataFrame - my solution creates a helper Dataframe using drop_duplicates, adds 6 days and concat's to original df before using your solution:

df1 = df.sort_values('wk start').drop_duplicates('car', keep='last').copy()
df1['wk start'] = df1['wk start'] + pd.Timedelta(6, unit='d')

df = pd.concat([df, df1], ignore_index=True)
df = df.set_index('wk start').groupby('car').resample('D').pad()
print (df)
                                    car  sales
car           wk start                        
tesla model 3 2018-08-12  tesla model 3  38000
              2018-08-13  tesla model 3  38000
              2018-08-14  tesla model 3  38000
              2018-08-15  tesla model 3  38000
              2018-08-16  tesla model 3  38000
              2018-08-17  tesla model 3  38000
              2018-08-18  tesla model 3  38000
              2018-08-19  tesla model 3  40000
              2018-08-20  tesla model 3  40000
              2018-08-21  tesla model 3  40000
              2018-08-22  tesla model 3  40000
              2018-08-23  tesla model 3  40000
              2018-08-24  tesla model 3  40000
              2018-08-25  tesla model 3  40000
tesla model x 2018-08-12  tesla model x  98000
              2018-08-13  tesla model x  98000
              2018-08-14  tesla model x  98000
              2018-08-15  tesla model x  98000
              2018-08-16  tesla model x  98000
              2018-08-17  tesla model x  98000
              2018-08-18  tesla model x  98000
user3483203
  • 50,081
  • 9
  • 65
  • 94
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Ah this works beautifully! Thank you! Though I must note for others that the values have to be wk values have to be sorted prior to dropping duplicates. Thanks again @jezrael – Mariah Akinbi Aug 10 '18 at 17:54
  • 1
    (and others) I've accepted this answer because 1) it is 0.5 sec faster than the one-liner answer and 2) it is easier to debug - thanks again! – Mariah Akinbi Aug 10 '18 at 19:17
1

you could also do:

(pd.melt(df.assign(w = df['wk start']+pd.DateOffset(6)),df.columns[1:],value_name =
  "wk start").drop('variable',1).set_index('wk start').groupby('car').resample('D').pad())


                                    car  sales
car           wk start                        
tesla model 3 2018-08-12  tesla model 3  38000
              2018-08-13  tesla model 3  38000
              2018-08-14  tesla model 3  38000
              2018-08-15  tesla model 3  38000
              2018-08-16  tesla model 3  38000
              2018-08-17  tesla model 3  38000
              2018-08-18  tesla model 3  38000
              2018-08-19  tesla model 3  40000
              2018-08-20  tesla model 3  40000
              2018-08-21  tesla model 3  40000
              2018-08-22  tesla model 3  40000
              2018-08-23  tesla model 3  40000
              2018-08-24  tesla model 3  40000
              2018-08-25  tesla model 3  40000
tesla model x 2018-08-12  tesla model x  98000
              2018-08-13  tesla model x  98000
              2018-08-14  tesla model x  98000
              2018-08-15  tesla model x  98000
              2018-08-16  tesla model x  98000
              2018-08-17  tesla model x  98000
              2018-08-18  tesla model x  98000
Onyambu
  • 67,392
  • 3
  • 24
  • 53