0

I know that chained-assignment in pandas is definitely a hot topic and there are a huge amount of questions on it but I am still unable to find a solution that works in my case.

I am working with irradiance and pv time series data (pandas dataframe with DateTimeIndex). There are holes in my series, some during night-time others during day-time. I would like to replace all the NaNs during the night-time with zeros because it would make sense (irradiance and pv production during night are null).

What I came up with so far is something like:

hour_range = [*range(17, 24)] + [*range(0, 9)]
mask = df['irradiance'].isna() & df['irradiance'].index.hour.isin(hour_range)
df.loc[mask, 'irradiance'] = 0

I tried also other solutions, like combining between_time with fill_na or using directly df.mask with the in_place option but I keep getting the dreaded SettingWithCopyWarning. I decided not to use between_time because it does not return a boolean series and does not allow combinining easily multiple conditions. Maybe I am wrong on this. I would like to modify the df in_place for memory efficiency. Is there a cleaner and safer solution to my problem? Thanks.

Rick
  • 11
  • 2

1 Answers1

0

Here is an example of how to create a time range (if needed), how to create an array of time you wish to manipulate, and how to alter the 'Data' column based on the "time to manipulate" array

import pandas as pd
import numpy as np
import datetime

#Making example data
start_date = datetime.datetime.now()
period_end_date = start_date + datetime.timedelta(hours=24)
dates = np.arange(np.datetime64(start_date), np.datetime64(period_end_date), np.timedelta64(1, 'h'), dtype='datetime64[h]')
data = np.random.randint(1, 100, 24)
df = pd.DataFrame(dates, columns = ['Dates'])
df['Data'] = data
df['Data'] = np.where(df['Data']%2 == 0, np.nan, df['Data'])

#Creating a dynamic time range and replaceing nan with "Something Else"
start_time = datetime.datetime.now() + datetime.timedelta(hours = 5)
end_time = start_time + datetime.timedelta(hours = 5)
#Creates a time range you which to manipulate
time_range = np.arange(np.datetime64(start_time), np.datetime64(end_time), np.timedelta64(1, 'h'), dtype='datetime64[h]')
#Replaces all the np.nan within the "time_range" variable with "Something Else" otherwise leave it as it is
df['Data'] = np.where((df['Dates'].isin(time_range)), df['Data'].fillna('Something Else'), df['Data'])
ArchAngelPwn
  • 2,891
  • 1
  • 4
  • 17