1

I have a dataframe which includes around 50 features. In my experiment I have a classification problem so I want to train the model by "GradientBoostingClassifier". The dataframe (mydata) is considered as a training set. One of those 50 features (feature20) is a date, and I need to consider this feature in my training set as well, so I have tried to convert the date to datetime64 as follows:

  mydata['feature20']=pd.to_datetime(mydata['feature20'])

Now, when I tried to train the model with the classifier, it gives me the following error:

  float() argument must be a string or a number, not 'Timestamp'

Any idea to solve this problem?

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Spedo
  • 355
  • 3
  • 13
  • 1
    This isn't to do with pandas as much as it is with the ML library you are using, which you haven't tagged. That's far more useful information than a "datetime" tag. You also haven't shown how you pass the data to your model – roganjosh Jun 08 '19 at 07:38
  • 1
    you can easily convert your dates to integers: `df["feature20"].astype("int64") // 10**9`. – MaxU - stand with Ukraine Jun 08 '19 at 07:40
  • 1
    But it's not a good idea to leave a datetime feature as it is, unless you are working with the time series. Usually you would want to extract additional information from that datetime - day of week, day of month, week of year, month #, etc. – MaxU - stand with Ukraine Jun 08 '19 at 07:41
  • Yes, this could be an option, but how I can reconvert them to the date time again? – Spedo Jun 08 '19 at 07:47

1 Answers1

1

You can easily convert your dates to integers: df["feature20"].astype("int64") // 10**9.

NOTE: but it's not a good idea to leave a datetime feature as it is, unless you are working with the time series. Usually you would want to extract additional information from that datetime - day of week, day of month, week of year, month #, etc


Demo:

In [9]: df = pd.DataFrame({'feature20':pd.date_range('2010-01-01', periods=10)})

In [10]: df["new"] = df["feature20"].astype("int64") // 10**9

In [11]: df
Out[11]:
   feature20         new
0 2010-01-01  1262304000
1 2010-01-02  1262390400
2 2010-01-03  1262476800
3 2010-01-04  1262563200
4 2010-01-05  1262649600
5 2010-01-06  1262736000
6 2010-01-07  1262822400
7 2010-01-08  1262908800
8 2010-01-09  1262995200
9 2010-01-10  1263081600

In [12]: df["date"] = pd.to_datetime(df["new"], unit="s")

In [13]: df
Out[13]:
   feature20         new       date
0 2010-01-01  1262304000 2010-01-01
1 2010-01-02  1262390400 2010-01-02
2 2010-01-03  1262476800 2010-01-03
3 2010-01-04  1262563200 2010-01-04
4 2010-01-05  1262649600 2010-01-05
5 2010-01-06  1262736000 2010-01-06
6 2010-01-07  1262822400 2010-01-07
7 2010-01-08  1262908800 2010-01-08
8 2010-01-09  1262995200 2010-01-09
9 2010-01-10  1263081600 2010-01-10

if you have microsecond precision:

In [28]: df = pd.DataFrame({'feature20':pd.date_range('2010-01-01 01:01:01.123456', freq="123S", periods=10)})

In [29]: df
Out[29]:
                   feature20
0 2010-01-01 01:01:01.123456
1 2010-01-01 01:03:04.123456
2 2010-01-01 01:05:07.123456
3 2010-01-01 01:07:10.123456
4 2010-01-01 01:09:13.123456
5 2010-01-01 01:11:16.123456
6 2010-01-01 01:13:19.123456
7 2010-01-01 01:15:22.123456
8 2010-01-01 01:17:25.123456
9 2010-01-01 01:19:28.123456

In [30]: df["new"] = df["feature20"].astype("int64") // 10**3

In [31]: df
Out[31]:
                   feature20               new
0 2010-01-01 01:01:01.123456  1262307661123456
1 2010-01-01 01:03:04.123456  1262307784123456
2 2010-01-01 01:05:07.123456  1262307907123456
3 2010-01-01 01:07:10.123456  1262308030123456
4 2010-01-01 01:09:13.123456  1262308153123456
5 2010-01-01 01:11:16.123456  1262308276123456
6 2010-01-01 01:13:19.123456  1262308399123456
7 2010-01-01 01:15:22.123456  1262308522123456
8 2010-01-01 01:17:25.123456  1262308645123456
9 2010-01-01 01:19:28.123456  1262308768123456

In [32]: df["date"] = pd.to_datetime(df["new"], unit="us")

In [33]: df
Out[33]:
                   feature20               new                       date
0 2010-01-01 01:01:01.123456  1262307661123456 2010-01-01 01:01:01.123456
1 2010-01-01 01:03:04.123456  1262307784123456 2010-01-01 01:03:04.123456
2 2010-01-01 01:05:07.123456  1262307907123456 2010-01-01 01:05:07.123456
3 2010-01-01 01:07:10.123456  1262308030123456 2010-01-01 01:07:10.123456
4 2010-01-01 01:09:13.123456  1262308153123456 2010-01-01 01:09:13.123456
5 2010-01-01 01:11:16.123456  1262308276123456 2010-01-01 01:11:16.123456
6 2010-01-01 01:13:19.123456  1262308399123456 2010-01-01 01:13:19.123456
7 2010-01-01 01:15:22.123456  1262308522123456 2010-01-01 01:15:22.123456
8 2010-01-01 01:17:25.123456  1262308645123456 2010-01-01 01:17:25.123456
9 2010-01-01 01:19:28.123456  1262308768123456 2010-01-01 01:19:28.123456
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • With the open-source Python library Feature-engine, we can create features automatically from datetime variables. More details here: https://feature-engine.readthedocs.io/en/latest/user_guide/datetime/DatetimeFeatures.html – Sole Galli Nov 22 '22 at 10:34