0

While working on the date column of panda dataframe which has very large data, I am getting error:

ValueError: cannot convert float NaN to integer

while converting date in day/week/month/year. I have to segregate in day/week/month/format and count the corresponding ids according to that.

Below process i followed:

  1. To convert column the in date format:
    df1['REQUESTTIMESTAMP'] = pd.to_datetime(df['REQUESTTIMESTAMP'],format= '%d-%b-%y %H.%M.%S.%f')

  2. To segregate date in day/week/month/format

    df1['Day/Week/Month/Year'] = df['REQUESTTIMESTAMP'].apply(lambda x: "%d/%d/%d/%d" % (x.day, x.week, x.month, x.year))

while executing this i am getting error:

`File "<pyshell#333>", line 1, in <lambda>
df1['Day/Week/Month/Year'] = df1['REQUESTTIMESTAMP'].apply(lambda x: "%d/%d/%d/%d" % (x.day, x.week, x.month, x.year))
ValueError: cannot convert float NaN to integer`

If the error is fixed I have to group IDs according to week,month,year by using: df1.groupby('Day/Week/Month/Year') ['CONVERSATIONID'].agg(['count', 'nunique'])

Expected o/p:

                     count  nunique
Day/Week/Month/Year                
12/41/10/2018           12       12
24/43/10/2018            2        2
30/44/10/2018            3        3

Original input file:

1050                    12-OCT-18 17.58.09.822000000<<coming correct
1055                    15-OCT-18 17.02.05.512000000<<Nat
1058                    15-OCT-18 17.10.55.264000000<<Nat
1061                    15-OCT-18 17.12.18.139000000<<Nat

Format of date column in my df coming as Nat:

    10                     1046         ...               12/40/10/2018
11                     1050         ...               12/40/10/2018
12                     1055         ...                         NaT
13                     1058         ...                         NaT
14                     1061         ...                         NaT
15                     1064         ...               24/42/10/2018
16                     1067         ...               24/42/10/2018
17                     1070         ...               30/43/10/2018
18                     1073         ...               30/43/10/2018
19                     1076         ...               30/43/10/2018
20                     1078         ...                         NaT
21                     1081         ...                         NaT
22                     1083         ...                         NaT
23                     1086         ...                         NaT
24                     1089         ...                         NaT
25                     1096         ...                         NaT

O/P from using strftime:

df1.groupby('Day/Week/Month/Year') ['CONVERSATIONID'].agg(['count', 'nunique'])
                     count  nunique
Day/Week/Month/Year                
12/40/10/2018           12       12
24/42/10/2018            2        2
30/43/10/2018            3        3
NaT                  39518    15376

Any help will be appreciated on how to fix the error and date coming as NaT after conversion as well?

think-maths
  • 917
  • 2
  • 10
  • 28
  • @jezrael this is not the duplicate question as mentioned, the problem statement is different – think-maths Mar 12 '19 at 11:53
  • OK, so what is expected output? Can you create [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) ? – jezrael Mar 12 '19 at 11:56
  • Have edited the the question with o/p required which is coming when data set is small..need suggestion to correct the error @jezrael – think-maths Mar 12 '19 at 12:05
  • Sorry, so need `df1['Day/Week/Month/Year'] = df1['REQUESTTIMESTAMP'].dt.strftime('%d/%U/%m/%Y')` ? – jezrael Mar 12 '19 at 12:07
  • Error is not coming using .strftime but Nat values are coming however which actually are different dates which are in correct format as well. Have updated the o/p coming from .strftime in my question @jezrael – think-maths Mar 12 '19 at 12:23
  • OK, so can you add your input data, which are incorrect parsed to NaT ? – jezrael Mar 12 '19 at 12:24
  • So what need do from NaT? Remove rows? replace by something? – jezrael Mar 12 '19 at 12:32
  • Just tested, cannot simulate NaT from your data, for me it working correctly. – jezrael Mar 12 '19 at 12:38
  • Have updated the question with input data and location where Nat comes.....Date columns for the maximums entries after conversion is showing Nat for few working correctly. I need to have dates in correct format instead of NaT so that I can use 'aggregate' for counts. Any lead will be helpful – think-maths Mar 12 '19 at 12:40
  • No idea, for me it workiong correctly. – jezrael Mar 12 '19 at 12:40
  • but reopened, maybe somebody find problem... – jezrael Mar 12 '19 at 12:54

0 Answers0