6

I have a table with a number of dates (some dates will be NaN) and I need to find the oldest date so a row may have DATE_MODIFIED, WITHDRAWN_DATE, SOLD_DATE, STATUS_DATE etc..

So for each row there will be a date in one or more of the fields I want to find the oldest of those and make a new column in the dataframe.

Something like this, if I just do one , eg DATE MODIFIED I get a result but when I add the second as below

table['END_DATE']=min([table['DATE_MODIFIED']],[table['SOLD_DATE']])

I get:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

For that matter will this construct work to find the min date, assuming I create correct date columns initially?

dartdog
  • 10,432
  • 21
  • 72
  • 121

3 Answers3

10

Just apply the min function along the axis=1.

In [1]: import pandas as pd 
In [2]: df = pd.read_csv('test.cvs', parse_dates=['d1', 'd2', 'd3'])
In [3]: df.ix[2, 'd1'] = None
In [4]: df.ix[1, 'd2'] = None
In [5]: df.ix[4, 'd3'] = None
In [6]: df
Out[6]:
                   d1                  d2                  d3
0 2013-02-07 00:00:00 2013-03-08 00:00:00 2013-05-21 00:00:00
1 2013-02-07 00:00:00                 NaT 2013-05-21 00:00:00
2                 NaT 2013-03-02 00:00:00 2013-05-21 00:00:00
3 2013-02-04 00:00:00 2013-03-08 00:00:00 2013-01-04 00:00:00
4 2013-02-01 00:00:00 2013-03-06 00:00:00                 NaT
In [7]: df.min(axis=1)
Out[7]:
0   2013-02-07 00:00:00
1   2013-02-07 00:00:00
2   2013-03-02 00:00:00
3   2013-01-04 00:00:00
4   2013-02-01 00:00:00
dtype: datetime64[ns]
Viktor Kerkez
  • 45,070
  • 12
  • 104
  • 85
  • fyi if you tried to use ``np.max`` with a ``NaT`` in the frame this will fail better to use ``df.min()/max()`` which handle the missing values correctly – Jeff Aug 08 '13 at 20:16
  • Since I have a lot of other fields in the DF how to create a axis argument? like:table['END_DATE'] = table.apply(np.min, [[table['DATE_MODIFIED']],[table['SOLD_DATE']]]) ?? – dartdog Aug 08 '13 at 20:29
  • 1
    If you have other fields, just use `df[['d1', 'd2', 'd3']].min(axis=1)` – Viktor Kerkez Aug 08 '13 at 20:31
  • Really appreciate the help, part of my hard learning exercises... Almost! It seems to fail if one of the fields is NaN (gives inf) works well when there are dates in the fields? suggestions? – dartdog Aug 08 '13 at 20:49
  • If you have a `NaN` then the particular column is not a `datetime64` `dtype`. How are you getting and parsing your data? And what are the dtypes of the columns? You should first make sure that all your columns are `datetime64` either by providing a custom date parser function when loading data of by cleaning up a data frame before doing the min... – Viktor Kerkez Aug 08 '13 at 20:55
  • Thank you, was just getting to realizing that since I was also getting some odd mins.. about to add the conversions – dartdog Aug 08 '13 at 21:06
  • Beautiful, with properly converted date-times, Thank you! – dartdog Aug 08 '13 at 21:24
  • One last item don't need to import numpy to get min, pandas has it – dartdog Aug 08 '13 at 22:00
  • Removed the unused import. – Viktor Kerkez Aug 08 '13 at 22:03
6

If tableis your DataFrame, then use its min method on the relevant columns:

table['END_DATE'] = table[['DATE_MODIFIED','SOLD_DATE']].min(axis=1)
Felix Zumstein
  • 6,737
  • 1
  • 30
  • 62
  • yes, that is the format that Viktor Kerkez supplied above, in the comments, Thank you! – dartdog Aug 08 '13 at 21:26
  • my answer was first though ;) – Felix Zumstein Aug 08 '13 at 21:37
  • Any idea how to also include a single vaue field in the comparison? eg somethng like table['END_DATE'] = table[['DATE_MODIFIED','SOLD_DATE',end-date]].min(axis=1) where end-date is a regular python varible, not part of the table? – dartdog Sep 11 '13 at 21:58
  • You would just add another line of code: `table['end-date'] = end_date` , assuming `end_date` is your variable. – Felix Zumstein Sep 11 '13 at 22:26
  • Thanks I just populated table['END_DATE'] 1st with end_date bsicaly the same solution.. – dartdog Sep 11 '13 at 22:36
  • Nice solution, but the min operation is converting the dates to float64, even when I start with datetime64[ns]. This is very annoying! – fccoelho Jul 15 '15 at 13:23
1

A slight variation over Felix Zumstein's

table['END_DATE'] = table[['DATE_MODIFIED','SOLD_DATE']].min(axis=1).astype('datetime64[ns]')

The astype('datetime64[ns]') is necessary in the current version of pandas (july 2015) to avoid getting a float64 representation of the dates.

fccoelho
  • 6,012
  • 10
  • 55
  • 67