2

I have a dataframe say df1 which has three fields that contain date type data. Call them 'Date 1', 'OC_Date', 'Date 2'. I want to filter this dataframe to obtain another dataframe such that it gives me the rows where 'OC_Date' is between 'Date 1' and 'Date 2':

Date 1 < OC_Date < Date 2

The format of the date in these three fields is as follows:

Date 1 : YYYY-MM-DD
OC_Date: DD-MM-YYYY:HH:MM:SS # (MM is text, eg. JAN for January)
Date 2 : YYYY-MM-DD

Thanks in Advance!

Kartik
  • 8,347
  • 39
  • 73
Ruffy26
  • 109
  • 1
  • 12
  • I have tried using the operator chaining like df[df] but I'm getting an error saying Cannot convert string to timestamp.I have also tried using the def conjunction method too but it gives the same error. – Ruffy26 Aug 11 '16 at 04:14
  • Google, Google, Google! Google for "pandas convert string to datetime"; see the table on this page: https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior Google for "pandas between"; "pandas apply"... – Kartik Aug 11 '16 at 04:17
  • @Kartik By this logic you can suggest this to anyone who posts a question here. Anyone can just Google anything then read the entire documentation so they don't need stackoverflow. That doesn't make sense. There is nothing wrong with posting questions on SO if first you try to solve the problem then post the question when you get stuck. That's what SO is for. – Joe T. Boka Aug 11 '16 at 04:31
  • Key words being, "first you try to solve the problem then post the question when you get stuck". If you are just starting, like OP, most of the things will already be mentioned in questions and docs. I am not saying that to everyone, only people who need to do it. Tell me one thing in this question that cannot be solved by a bit of googling. – Kartik Aug 11 '16 at 04:53

1 Answers1

3

You can first convert columns from strings to_datetime and then filter by dates with dt.date and boolean indexing:

import pandas as pd

df = pd.DataFrame({'Date 1':['2015-01-04','2015-01-05','2015-01-05'],
                   'OC_Date':['05-JAN-2015:10:10:20',
                              '05-JAN-2015:11:15:31',
                              '05-JAN-2015:08:05:09'],
                   'Date 2':['2015-01-06','2015-01-08','2015-01-10']})
df['Date 1'] = pd.to_datetime(df['Date 1'])
df['Date 2'] = pd.to_datetime(df['Date 2'])
#http://strftime.org/
df['OC_Date'] = pd.to_datetime(df['OC_Date'], format='%d-%b-%Y:%H:%M:%S')
print (df)
      Date 1     Date 2             OC_Date
0 2015-01-04 2015-01-06 2015-01-05 10:10:20
1 2015-01-05 2015-01-08 2015-01-05 11:15:31
2 2015-01-05 2015-01-10 2015-01-05 08:05:09

print (df.dtypes)
Date 1     datetime64[ns]
Date 2     datetime64[ns]
OC_Date    datetime64[ns]
dtype: object

mask = (df['Date 1'].dt.date < df['OC_Date'].dt.date) & 
       (df['OC_Date'].dt.date < df['Date 2'].dt.date)
print (mask)
0     True
1    False
2    False
dtype: bool

print (df[mask])
      Date 1     Date 2             OC_Date
0 2015-01-04 2015-01-06 2015-01-05 10:10:20
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for accepting! And small advice for your next questions - I think you can check [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Then dont forget add some code (or pseudocode or some errors) Then your question will be more attractive and no downvotes. Good luck! – jezrael Aug 11 '16 at 05:56
  • Thanks a lot for the advice. I'm pretty new to coding in general and to SO. So thanks for the tips. Have a good day! :) – Ruffy26 Aug 11 '16 at 06:08