0

I am trying to only get the date of IDs with history starting on 1/1/2022 through today or a further date out. I have imported a csv from excel and have imported pandas numpy and timedelta. The current "start_date" is an object in string format. Here is an example of what I have.

 ID        start_date        end_date 
  1      3/25/2021       9/17/2021
  1       9/17/2021        2/3/2022
  2       8/23/2021        3/3/2022
  2       3/3/2022         9/25/2022
  2       9/25/2022        12/29/2022
  3       4/19/2022        11/8/2022
  3       11/8/2022        1/23/2023

I would like to filter only on those start dates that start after 1/1/2022. Something like this

ID      start_date      end_date 
 2      3/3/2022        9/25/2022
 2     9/25/2022        12/29/2022
 3     4/19/2022        11/8/2022   
 3     11/8/2022        1/23/2023

I have code that filters but it doesn't do it correctly:

 filtered_df = df.loc[(df['start_date'] >= '1/1/2022') & (df['start_date'] < '12/31/2099')]

This code still has dates out of the range of my desired dates.

  • convert your dates to datetime first. `s = pd.to_datetime(df['start_date'], dayfirst=False) ; filtered_df = df.loc[s.ge(pd.Timestamp('2022-01-01')) & s.lt(pd.Timestamp('2099-12-31'))]` – mozway Mar 01 '23 at 14:32

0 Answers0