2

Let's say that I have the following dataframe:

dates,qq
1900-01-01,1
1900-01-02,2
1900-01-03,3
1900-01-04,4
1900-01-05,5
1900-01-06,6
1900-01-07,7
1901-01-01,8
1901-01-02,9
1901-01-03,10
1901-01-04,11
1901-01-05,12
1901-01-06,13
1901-01-07,14
1902-01-01,15
1902-01-02,16
1902-01-03,17
1902-01-04,18
1902-01-05,19
1902-01-06,20
1902-01-07,21
1903-12-30,22
1903-12-31,23

Firstly, I have read the dataframe as:

dfr = pd.read_csv('test.csv', sep=',',index_col=0,parse_dates=True)

Now I would like selected all the values around a specific day of the year but independently from the year.

Let's say that I have selected the first of January and a +- delta of 2 days:

delta_d = pd.Timedelta(2., unit='d') 
dates   = pd.date_range(start='1/1/1400', end='1/1/1401')
day_c = dates[0]

A good idea could be to create a mask. I have tried the following:

mask = (dfr.index.day>= day_c.day) & (dfr.index.day < day_c.day+delta_d)

However, as I could have expected, I get the following error:

OutOfBoundsDatetime: Cannot cast 1400-01-01 00:00:00 to unit='ns' without overflow.

I expect the following result:

res = [1,2,3,8,9,10,15,16,17,22,23]
diedro
  • 511
  • 1
  • 3
  • 15
  • 1
    According to your problem description, 22 should not be in the expected result, @diedro. – PaulS Jun 09 '23 at 19:04
  • You are right. I have edit as consequence my questions by adding a +-delta and by changing some dates. – diedro Jun 10 '23 at 13:31

3 Answers3

1

Try this:

# For each row, s is Jan 1 of the same year
s = pd.to_datetime(
    pd.DataFrame({"year": df.index.year, "month": 1, "day": 1})
).to_numpy()

start_date = s - pd.Timedelta(days=2)
end_date = s + pd.Timedelta(days=2)

# Filtering
df[(start_date <= df.index) & (df.index <= end_date)]
Code Different
  • 90,614
  • 16
  • 144
  • 163
0

Another possible solution:

dfr[[x == 1 and y in range(1, 4) for x, y in zip(dfr.index.month, dfr.index.day)]]

Output:

            qq
dates         
1900-01-01   1
1900-01-02   2
1900-01-03   3
1901-01-01   8
1901-01-02   9
1901-01-03  10
1902-01-01  15
1902-01-02  16
1902-01-03  17
PaulS
  • 21,159
  • 2
  • 9
  • 26
  • I am sorry. Thanks to your previous comment I have up-dated my question by adding a +-delta. – diedro Jun 10 '23 at 13:35
  • 1
    With negative deltas, I would recommend using @CodeDifferent's solution, which is more suitable for that purpose. – PaulS Jun 11 '23 at 18:38
0

Try This approach, it is working for delta days ahead. You just have to work it out for delta days behind.

import pandas as pd

data = {
    'dates': ['1900-01-01', '1900-01-02', '1900-01-03', '1900-01-04', '1900-01-05', '1900-01-06', '1900-01-07',
              '1901-01-01', '1901-01-02', '1901-01-03', '1901-01-04', '1901-01-05', '1901-01-06', '1901-01-07',
              '1902-01-01', '1902-01-02', '1902-01-03', '1902-01-04', '1902-01-05', '1902-01-06', '1902-01-07',
              '1903-12-30', '1903-12-31'],
    'qq': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
}

df = pd.DataFrame(data)
df['dates'] = pd.to_datetime(df['dates'])

# Define target date and delta
target_date_str = '01-01'
delta = pd.Timedelta(2, unit='d')

# Filter the DataFrame based on the target date and delta
mask = ((df['dates'].dt.month == pd.to_datetime(target_date_str, format='%m-%d').month) &
        (df['dates'].dt.day.between(pd.to_datetime(target_date_str, format='%m-%d').day - delta.days,
                                    pd.to_datetime(target_date_str, format='%m-%d').day + delta.days)))
result = df.loc[mask, 'qq'].tolist()

print(result)

Output: [1, 2, 3, 8, 9, 10, 15, 16, 17]