1

I have a pandas dataframe of electricity consumption data throughout a year, but would like to update the table to be in another year. I would like the data values to fall on the same weekdays as before.

What I have:

Date          00:00   ...     WeekDay   requiredDate  requiredWeekDay
25/11/2018       20            Sunday     25/11/2018           Sunday
26/11/2018       30            Monday     26/11/2018           Monday
27/11/2018       25           Tuesday     27/11/2018          Tuesday
28/11/2018       35         Wednesday     28/11/2018        Wednesday
29/11/2018       40          Thursday     29/11/2018         Thursday
30/11/2018       15            Friday     30/11/2018           Friday
01/12/2017       65            Sunday     01/12/2018         Saturday
02/12/2017       34            Monday     02/12/2018           Sunday
03/12/2017       81           Tuesday     03/12/2018           Monday
04/12/2017       62         Wednesday     04/12/2018          Tuesday
...

What I would like:

Date          00:00   ...     WeekDay     
25/11/2018       20            Sunday               
26/11/2018       30            Monday              
27/11/2018       25           Tuesday         
28/11/2018       35         Wednesday        
29/11/2018       40          Thursday           
30/11/2018       15            Friday               
01/12/2018                   Saturday            
02/12/2018       65            Sunday              
03/12/2018       34            Monday               
04/12/2018       81           Tuesday            
...

What I have tried:

df['Day'] = df['Date'].dt.day
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year
requiredYear = str(df['Year'].median()).replace(".0","")

df = df.sort_values(by = ['Month', 'Day']).reset_index()

df['RemappedDate']= np.nan

for index, row in df.iterrows():
  if row['Weekday'] != row['requiredWeekday']:
    while row[row['Day']]<31:
      row['Day'] = row['Day']-1    
      row['RemappedDate'] = pd.to_datetime(str(row['Month'])+"/"+ 
                            str(row['Day'])+"/"+requiredYear)
  else:
    print("Already equal")

df['Date'] = df['RemappedDate']
df['Weekday'] = df['requiredWeekday']

Probably nowhere near, so sorry if not. I'm a beginner.

N.Homer
  • 25
  • 5
  • is the dtype string? or `datetime`? – anky Jan 25 '19 at 17:27
  • Do you have specific dates you want to change ? I think we can solve your problem, but we lack of information so that the solution will solve every problem like this in the future. – IMCoins Jan 25 '19 at 17:28
  • @IMCoins I might be combining data from numerous years i.e. Jan-March 2016, April-July 2017 and July-December 2018. I was using `row['Day']-1` for ease in this case, but am aware this would not work for all cases. – N.Homer Jan 25 '19 at 17:33

2 Answers2

2

If I was you, I'd just "keep" the 2 columns that are already made for you and "shift" the something column such as...

mask = df['Date'] <= '2018-01-01'
df['something'][mask] = df['something'][mask].shift(1)

And you could keep the 2 columns "new_date" and "new_day". Drop the others and rename those 2, whatever you feel like doing. :)

IMCoins
  • 3,149
  • 1
  • 10
  • 25
  • Good one. But this would be subjective to the fact that the dates are in order. Hence i didnot go wit it since not generic. +1 from me. :) – anky Jan 25 '19 at 18:15
1

IIUC , you can just increase the year without the help of required date column:

print(df)

        Date  something    WeekDay
0 2018-11-25         20     Sunday
1 2018-11-26         30     Monday
2 2018-11-27         25    Tuesday
3 2018-11-28         35  Wednesday
4 2018-11-29         40   Thursday
5 2018-11-30         15     Friday
6 2017-12-01         65     Sunday
7 2017-12-02         34     Monday
8 2017-12-03         81    Tuesday
9 2017-12-04         62  Wednesday

df['new_Date']=df['Date'].mask(df['Date'].dt.year == 2017, df['Date'] + pd.to_timedelta(1, unit='y') + pd.to_timedelta(12, unit='h'))
df['required_date'] = df.new_Date.dt.date
df['new_day']=df.new_Date.dt.day_name()
df['new_value']=np.where(df.WeekDay==df.new_day,df.something,df.new_day.map(dict(zip(df.loc[df.WeekDay!=df.new_day,'WeekDay'],df.loc[df.WeekDay!=df.new_day,'something']))))

print(df)

        Date  something    WeekDay            new_Date required_date  \
0 2018-11-25         20     Sunday 2018-11-25 00:00:00    2018-11-25   
1 2018-11-26         30     Monday 2018-11-26 00:00:00    2018-11-26   
2 2018-11-27         25    Tuesday 2018-11-27 00:00:00    2018-11-27   
3 2018-11-28         35  Wednesday 2018-11-28 00:00:00    2018-11-28   
4 2018-11-29         40   Thursday 2018-11-29 00:00:00    2018-11-29   
5 2018-11-30         15     Friday 2018-11-30 00:00:00    2018-11-30   
6 2017-12-01         65     Sunday 2018-12-01 17:49:12    2018-12-01   
7 2017-12-02         34     Monday 2018-12-02 17:49:12    2018-12-02   
8 2017-12-03         81    Tuesday 2018-12-03 17:49:12    2018-12-03   
9 2017-12-04         62  Wednesday 2018-12-04 17:49:12    2018-12-04   

     new_day  new_value  
0     Sunday       20.0  
1     Monday       30.0  
2    Tuesday       25.0  
3  Wednesday       35.0  
4   Thursday       40.0  
5     Friday       15.0  
6   Saturday        NaN  
7     Sunday       65.0  
8     Monday       34.0  
9    Tuesday       81.0 
anky
  • 74,114
  • 11
  • 41
  • 70
  • Thanks for your answer. But I am looking to keep data on the same weekday. I.e. if it was on a Sunday in 2017, it should fall on a Sunday in 2018. – N.Homer Jan 25 '19 at 17:46
  • @N.Homer which data , you mean you want the values i.e 20,30 etc on same day as the orginal df? – anky Jan 25 '19 at 17:48
  • @N.Homer then what should be the value for Saturday since it is initially not there in Weekday column – anky Jan 25 '19 at 17:53
  • Since there is no Saturday data in my example data set, I think I will just leave it empty. – N.Homer Jan 25 '19 at 17:57
  • @N.Homer check updated solution, should work as you want – anky Jan 25 '19 at 18:04