1

I am trying to convert dataframes to a certain datatype.

My data looks like this initially

userToAppend333.head()

Output

            UserID  Rating  GoodreadsID
484969  1397324 0   13617
484970  1397342 5   105576
484971  1397342 4   3320520
484972  1397342 4   865
484973  1397342 3   105578

I am trying to execute this operation

userToAppend333 = userToAppend333.astype(np.int32)

But I get this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-c6f5e3c74de7> in <module>()
----> 1 userToAppend333 = userToAppend333.astype(np.int32)

5 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
   5689             # else, only a single dtype is given
   5690             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691                                          **kwargs)
   5692             return self._constructor(new_data).__finalize__(self)
   5693 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in astype(self, dtype, **kwargs)
    529 
    530     def astype(self, dtype, **kwargs):
--> 531         return self.apply('astype', dtype=dtype, **kwargs)
    532 
    533     def convert(self, **kwargs):

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    393                                             copy=align_copy)
    394 
--> 395             applied = getattr(b, f)(**kwargs)
    396             result_blocks = _extend_blocks(applied, result_blocks)
    397 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    532     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    533         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534                             **kwargs)
    535 
    536     def _astype(self, dtype, copy=False, errors='raise', values=None,

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    631 
    632                     # _astype_nansafe works fine with 1-d only
--> 633                     values = astype_nansafe(values.ravel(), dtype, copy=True)
    634 
    635                 # TODO(extension)

/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    681         # work around NumPy brokenness, #1987
    682         if np.issubdtype(dtype.type, np.integer):
--> 683             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
    684 
    685         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: 'UserID'

From my understanding of this error, there is some value in the column 'UserID' which can't be converted to an np.int32 datatype. So I am trying to look what these values are, but the dataframe is thousands of rows long, so it's now easy to locate the rows with the problematic values.

Is there a method to locate where exactly the error occurs for a data conversion error?

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

0 Answers0