4

I have the below data :

grp_m1      grp_m2      grp_m3      grp_m4
$50-$75     $50-$75     $50-$75     $50-$75
$50-$75     $50-$75     $50-$75     $50-$75
$150-$175       $150-$175       $150-$175       $150-$175
$100-$125       $100-$125       $100-$125       $100-$125
$150-$175       $125-$150       $125-$150       $125-$150

These are then converted to dummies. The dtype of these dummies is unsigned int in the pandas dataframe, and when I try to convert into this into an R dataframe by using the below code:

from rpy2.robjects import pandas2ri
pandas2ri.activate()
pandas2ri.py2ri(data)

I get the error below:

Error while trying to convert the column "grp_m4_$175-$200". Fall back to string conversion. The error is: Cannot convert numpy array of unsigned values -- R does not have unsigned integers.
  (name, str(e)))
C:\Users\hduser\AppData\Local\Continuum\anaconda3.1\lib\site-packages\rpy2-2.9.1-py3.6-win-amd64.egg\rpy2\robjects\pandas2ri.py:61: UserWarning: Error while trying to convert the column "grp_m4_$200-$225". Fall back to string conversion. The error is: Cannot convert numpy array of unsigned values -- R does not have unsigned integers.
  (name, str(e)))

Can this be fixed or do I need to remove those columns all together, e.g. just skip the column if this error is coming?

Can someone please help me with this?

Marcus Campbell
  • 2,746
  • 4
  • 22
  • 36
Shuvayan Das
  • 1,198
  • 3
  • 20
  • 40
  • I don't speak python but I wouldn't be surprised if the solution was converting these values to signed integers. – Roland Apr 12 '18 at 13:13

2 Answers2

3

You can use astype() from pandas in order to convert all of the elements in a pandas dataframe to the desired dtype. In this case we just want to convert your dummy variables into something that R understands. Assuming that your dataframe is still named "data", try this code:

import pandas as pd

# change unsigned integers to integers
n_data = data.astype('int64') # you could also try float64, if you want

# Check data type
type(n_data.iat[0,0])

# Output
# <class 'numpy.int64'>

from rpy2.robjects import pandas2ri
pandas2ri.activate()

pandas2ri.py2ri(data)
Marcus Campbell
  • 2,746
  • 4
  • 22
  • 36
1

Marcus's answer helped me a lot.

In my instance,I think the main cause of this problem is that the item of Pandas.DataFrame were converted to numpy.uint8 after being converted to dummy variable by pd.get_dummies().

So,I just converted it to 'int64' by astype() before applying pandas2ri.py2ri(data) and finally I fixed the bug.

andrewxli
  • 21
  • 2