7

I would like to know how to filter different dates at all the different time levels, i.e. find dates by year, month, day, hour, minute and/or day. For example, how do I find all dates that happened in 2014 or 2014 in the month of January or only 2nd January 2014 or ...down to the second?

So I have my date and time dataframe generated from pd.to_datetime

df
    timeStamp
0   2014-01-02 21:03:04
1   2014-02-02 21:03:05
2   2016-02-04 18:03:10

So if I filter by the year 2014 then I would have as output:

    timeStamp
0   2014-01-02 21:03:04
1   2014-02-02 21:03:05

Or as a different example I want to know the dates that happened in 2014 and at the 2nd of each month. This would also result in:

    timeStamp
0   2014-01-02 21:03:04
1   2014-02-02 21:03:05

But if I asked for a date that happened on the 2nd of January 2014

    timeStamp
0   2014-01-02 21:03:04

How can I achieve this at all the different levels?

Also how do you compare dates at these different levels to create an array of boolean indices?

Federico Garza
  • 147
  • 1
  • 1
  • 10

3 Answers3

13

You can filter your dataframe via boolean indexing like so:

df.loc[df['timeStamp'].dt.year == 2014]
df.loc[df['timeStamp'].dt.month == 5]
df.loc[df['timeStamp'].dt.second == 4]
df.loc[df['timeStamp'] == '2014-01-02']
df.loc[pd.to_datetime(df['timeStamp'].dt.date) == '2014-01-02']

... and so on and so forth.

Andrew L
  • 6,618
  • 3
  • 26
  • 30
  • 2
    Thanks for the response. The first three work but not the fourth, that generates an empty dataframe. So that's the problem I stumbled upon, but I wanted to generalise it to every part of the date and time, and all different possible combinations. – Federico Garza May 24 '17 at 15:38
  • 1
    The empty df indicates there are no records with that date. Did you try switching it out with the one you're looking for? – Andrew L May 24 '17 at 16:22
  • Yes I tried and it didn't work, did it work for you? I think it may be due that they are not *exactly* the same date. One has a certain hour:minute:time associated and the other doesn't but I'm not sure if this is tru. – Federico Garza May 24 '17 at 16:24
  • 4
    Ah sorry I wasn't thinking. `df['timeStamp'].dt.date` will return an `object`, so you'll need to convert back to `datetime46`. See edits above. Please accept as answer if this solves it! – Andrew L May 24 '17 at 17:27
1

If you set timestamp as index and dtype as datetime to get a DateTimeIndex, then you can use the following Partial String Indexing syntax:

df['2014'] # gets all 2014
df['2014-01'] # gets all Jan 2014
df['01-02-2014'] # gets all Jan 2, 2014
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

I would just create a string series, then use str.contains() with wildcards. That will give you whatever granularity you're looking for.

s = df['timeStamp'].map(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))

print(df[s.str.contains('2014-..-.. ..:..:..')])
print(df[s.str.contains('2014-..-02 ..:..:..')])
print(df[s.str.contains('....-02-.. ..:..:..')])
print(df[s.str.contains('....-..-.. 18:03:10')])

Output:

        timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
        timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
        timeStamp
1 2014-02-02 21:03:05
2 2016-02-04 18:03:10
        timeStamp
2 2016-02-04 18:03:10

I think this also solves your question about boolean indices:

print(s.str.contains('....-..-.. 18:03:10'))

Output:

0    False
1    False
2     True
Name: timeStamp, dtype: bool
Troy D
  • 2,093
  • 1
  • 14
  • 28