0

This question has been asked in many threads and has worked for others, but not for me. I am trying to convert object data type into int to perform a group by aggregation. Following are what I tried and the errors I got so far, (I am using python 3) According to this link, I tried these two:

df['my_var'] = df['my_var'].astype(str).astype(int)
df['my_var'] = df['my_var'].astype(int)

Same error for both:

ValueError: invalid literal for int() with base 10: '*'

And then I tried,

df['my_var'] = pd.to_numeric(df['my_var'])

I got an error:

ValueError: Unable to parse string "*" at position 6116

This is how dtypes looks like,

print (df.dtypes)
my_var object
dtype: object

I know some of the similar questions are down voted, however, I did not succeed using those answers. Is it a version error? I am finding it difficult to understand this error. Any help or suggestion would be appreciated.

i.n.n.m
  • 2,936
  • 7
  • 27
  • 51
  • Somewhere in your dataframe (actually, at the position 6116) you have a string `"*"` that you attempt to convert to an integer number. Don't do that. – DYZ Jul 18 '17 at 21:24
  • @DYZ I checked for special characters using `df.applymap(np.isreal)` and it was all `flase` is that the correct way to check? – i.n.n.m Jul 18 '17 at 21:26
  • But you try to convert to integers, not reals. They are not the same. – DYZ Jul 18 '17 at 21:27
  • @DYZ yeah I am going to check them again, any idea why `df['my_var'] = df['my_var'].astype(str).astype(int)` doesn't convert to `int`? – i.n.n.m Jul 18 '17 at 21:29
  • 2
    Exactly because `"*"` is _not_ an int. – DYZ Jul 18 '17 at 21:29
  • @DYZ Yup, got it, what is that 'position 6116'? is it the row number? I checked with `df.ix[6116]` and I did not see any special characters in the output. – i.n.n.m Jul 18 '17 at 21:35
  • 1
    Not sure. Check `df[df['my_var']=='*']`. – DYZ Jul 18 '17 at 21:40
  • 1
    `df.loc[pd.to_numeric(df['my_var'], errors='coerce').isnull()]` - should show you all rows where `my_var` couldn't been converted to numeric value... – MaxU - stand with Ukraine Jul 18 '17 at 21:45
  • 1
    Your question can be easily answered as soon as you decide what would you like to do with those values that can't be converted to integer values... – MaxU - stand with Ukraine Jul 18 '17 at 21:50

2 Answers2

1

I used 0 to replace anything that isn't a number but you can use any other value that makes sense to you e.g. -999999 (not a suggested practice obviously but just an example)

pd.to_numeric(df.my_var, errors='coerce').fillna(0).astype(int)
A.Kot
  • 7,615
  • 2
  • 22
  • 24
0

After getting suggestions from #DYZ and #MaxU, it was an error due to the special character * in a row in in my DataFrame. (Error message was obvious)

As suggested, using,

df[df['my_var']=='*']

and

df.loc[pd.to_numeric(df['my_var'], errors='coerce').isnull()]

I found where exactly the special character was. Then used regular expression method to strip off special characters using this thread.

i.n.n.m
  • 2,936
  • 7
  • 27
  • 51