0

Any give me a tip for filtering a dataset to contain only data for Monday through Friday on the times of 8AM to 4PM?

import numpy as np
import pandas as pd
np.random.seed(11)

rows,cols = 50000,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='H') 

df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

#filter for time
df = df.between_time('8:00', '16:00')

#filter out weekends
df = df[(df.index.weekday < 6)]

df

This only seems to be Tuesdays for some reason

df['dayName'] = df.index.day_name()[0]

df.dayName.describe()

prints

count       16074
unique          1
top       Tuesday
freq        16074
Name: dayName, dtype: object
bbartling
  • 3,288
  • 9
  • 43
  • 88
  • `df.index.day_name()[0]` returns a scalar 'Tuesday' a single value, just the value of the first record, now you are setting this as a column in the dataframe. This is not the correct way. `df['dayName'] = df.index.day_name()` Non need for the slicing with `[0]`. – Scott Boston Mar 02 '21 at 17:17

1 Answers1

2

Try this:

import numpy as np
import pandas as pd
np.random.seed(11)

rows,cols = 50000,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='H') 

df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

#filter for time
df1 = df.between_time('8:00', '16:00')

#filter out weekends
df1 = df1[df1.index.weekday <= 4]

df1

Then, check:

df1.index.day_name().value_counts()

Output:

Thursday     2682
Tuesday      2682
Wednesday    2682
Friday       2682
Monday       2673
dtype: int64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Could you ever help me with this SO post? https://stackoverflow.com/questions/74908952/pandas-calculating-time-deltas-from-index – bbartling Dec 24 '22 at 17:34