0

I have the following dataset:

import pandas as pd
import time

the_df = pd.DataFrame(
    {
        "id": [21, 22, 23, 24, 15, 26, 27, 28, 29, 30],
        "amount": [10, 30, 12, 60, 0.5, 0.2, np.nan, 5, np.nan, 3],
        "date": [
            '2022-01-01','2022-08-01',None,'2022-12-01','2022-02-01',
            None,'2022-02-01','2022-01-01','2022-11-01','2022-01-01'
        ]
    }
)
the_df
    id  amount  date
0   21  10.0    2022-01-01
1   22  30.0    2022-08-01
2   23  12.0    None
3   24  60.0    2022-12-01
4   15  0.5     2022-02-01
5   26  0.2     None
6   27  NaN     2022-02-01
7   28  5.0     2022-01-01
8   29  NaN     2022-11-01
9   30  3.0     2022-01-01

And I want to convert the column date into a float. My approach was to convert the date column into a pandas datetime, and then convert column into a float using a lambda function. My problem was to evaluate a NaT to avoid the conversion in case is NaT. This is what I tried:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x.notnull() else x)

But I get this error:

AttributeError: 'Timestamp' object has no attribute 'notnull'

I tried this second option:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x is not None else x)

But I get this error:

ValueError: NaTType does not support timetuple

Please, could you point what I am doing wrong? Why if the value is not evaluated when is None, it seems to be evaluated as timetuple?

sj95126
  • 6,520
  • 2
  • 15
  • 34
Alexis
  • 2,104
  • 2
  • 19
  • 40
  • Does this answer your question? [Convert a column of datetimes to epoch in Python](https://stackoverflow.com/questions/35630098/convert-a-column-of-datetimes-to-epoch-in-python) – sushanth Oct 03 '22 at 02:22
  • Hi @sushanth, thanks for the suggestion, but my question is about evaluating in a lambda function a NaT type. – Alexis Oct 03 '22 at 02:50

3 Answers3

1

You can test for the existence of an NaT value specifically:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x is not pd.NaT else x)
sj95126
  • 6,520
  • 2
  • 15
  • 34
  • Hello @sj95126, thank you! I didn't know that I could evaluate `pd.NaT` (and I guess I can evaluate `pd.NaN` too, can't I?) Have a great day! – Alexis Oct 03 '22 at 16:51
1

After changing the column to datetime, you can use df.loc to get the values that is not NaT and then do the apply.

melt = ~pd.isna(the_df['date']),['date']

the_df.loc[melt] = the_df.loc[melt]['date'].apply(lambda x: float(time.mktime(x.timetuple())))

   id  amount          date
0  21    10.0  1640966400.0
1  22    30.0  1659283200.0
2  23    12.0           NaT
3  24    60.0  1669824000.0
4  15     0.5  1643644800.0
5  26     0.2           NaT
6  27     NaN  1643644800.0
7  28     5.0  1640966400.0
8  29     NaN  1667232000.0
9  30     3.0  1640966400.0
1

The error is resolved if you use pd.notnull instead in your lambda, or if you check for the date value being NaT. Personally, I prefer the first option, as it works with all the different typed nulls.

the_df.date = the_df.date.apply(lambda x: float(time.mktime(x.timetuple())) if pd.notnull(x) else x)

Your 2nd attempt does not work because the values in column date (after conversion by pd.to_datetime) don't contain any None, and therefore the if x is not None is always Truedf

Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • 1
    Hello @Haleemur Ali, I appreciate the explanation on the `pd.notnull` and `None` attempts, which made me investigate more. Your solution worked fine! Thank you very much! – Alexis Oct 03 '22 at 16:50