rpy2/Rmagic: can't read csv data file

Question

I have a pretty standard csv data set that I'm trying to read in IPython Notebook using rpy2/Rmagic:

# R code
%load_ext rmagic
%R my.data <- read.csv("/Users/xxx/Documents/data.csv")

I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-844400cf68c6> in <module>()
     25 ####Chunk 1: Inputting and checking the data
---> 27 get_ipython().magic(u'R my.data <- read.csv("/Users/xxx/Documents/data.csv")')
     28 get_ipython().magic(u'R summary(my.data)')

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
   2162         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2163         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2164         return self.run_line_magic(magic_name, magic_arg_s)
   2165 
   2166     #-------------------------------------------------------------------------

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2088                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2089             with self.builtin_trap:
-> 2090                 result = fn(*args,**kwargs)
   2091             return result
   2092 

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/extensions/rmagic.pyc in R(self, line, cell, local_ns)

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    189     # but it's overkill for just that one bit of state.
    190     def magic_deco(arg):
--> 191         call = lambda f, *a, **k: f(*a, **k)
    192 
    193         if callable(arg):

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/extensions/rmagic.pyc in R(self, line, cell, local_ns)
    579         if return_output and not args.noreturn:
    580             if result != ri.NULL:
--> 581                 return self.Rconverter(result, dataframe=False)
    582 
    583 __doc__ = __doc__.format(

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/IPython/extensions/rmagic.pyc in Rconverter(Robj, dataframe)
    113             return np.asarray(Robj)
    114         Robj = np.rec.fromarrays(Robj, names = names)
--> 115     return np.asarray(Robj)
    116 
    117 @magics_class

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    233 
    234     """
--> 235     return array(a, dtype, copy=False, order=order)
    236 
    237 def asanyarray(a, dtype=None, order=None):

TypeError: __float__ returned non-float (type rpy2.rinterface.NAIntegerType)

I'm guessing this has something to do with NA values in my csv data. I don't actually put a value there - just a blank entry (e.g. 1,,3,4).

I tried replacing the blank entries with NA, a space, 0, etc. -- I always get the same error. What am I doing wrong?

Edit: I tried doing it with pure rpy2 (without making any changes to my data set):

import rpy2.robjects as robjects                                                                                                                                                                                                                        
myData = robjects.r['read.csv']("/Users/xxx/Documents/data.csv")
print robjects.r['summary'](myData)

and it works fine! So this must be something with IPython/Rmagic.

score 3 · Accepted Answer · answered Sep 21 '12 at 00:48

The error is because %R in IPython is attempting to turn the entire csv file into a single array of dtype float. The NA value in the integer column cannot be converted to a float, so an exception is raised.

For example:

>>> import rpy2.robjects as ro
>>> import numpy as np
>>> myData = ro.r['read.csv']('data.csv')
>>> np.asarray(myData)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 235, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: __float__ returned non-float (type rpy2.rinterface.NAIntegerType)

A simple fix is to use the --dataframe / -d flag in %R. Note that we'll need to use the --noreturn / -n flag so ensure that we don't try to convert the return value into an array (which would again trigger the error). [Alternatively we could have put a semicolon at the end of the command.]

For example:

In [1]: %load_ext rmagic

In [2]: %R -n -d myData myData <- read.csv('data.csv')

In [3]: myData
Out[3]: 
array([(1, 1, 1, 25, 0.590334, 0.4991572, 0.2189781, 9),
       (1, 1, 1, 25, 0.5504164, 0.5007439, 0.2136691, 13),
       (1, 1, 1, 25, 0.588486, 0.4879058, 0.2105431, 11),
       (1, 1, 1, 25, 0.5882244, 0.5148501, 0.2105431, -2147483648),
       (1, 2, 1, 25, nan, 0.489045, 0.2025757, 12)], 
      dtype=[('replicate', '<i4'), ('line', '<i4'), ('genotype', '<i4'), ('temp', '<i4'), ('femur', '<f8'), ('tibia', '<f8'), ('tarsus', '<f8'), ('SCT', '<i4')])

Beware that the NAInteger value was converted to -2147483648 (which is equal to numpy.iinfo('<i4').min).

The assumption that a csv file is homogeneous in type is looking a bit bold. May be it would be more intuitive to make what is currently the *--dataframe* option the default, and create a new option, e.g., "--homogeneous" ? — lgautier, Sep 21 '12 at 11:46

score 1 · Answer 2 · answered Sep 20 '12 at 17:08

1

I am guessing from the traceback that somewhere the type of a column is guessed wrong (it thinks it is a Python float while the NA is an integer). As it is I can't tell whether this is an issue with ipython or rpy2 (you'd have to try with rpy2 alone). If the column with the NA does have numerical values that seems like integers, add .0 and see if it solves the problem.

answered Sep 20 '12 at 17:08

lgautier

11,363
29
42

Adding the .0 didn't change anything. I just edited my main post with the code for a pure rpy2 test; it looks like it's something to do with IPython. – Randy Olson Sep 20 '12 at 17:39
+1That's with ipython. File a bug report with them so it gets corrected. – lgautier Sep 20 '12 at 18:09

rpy2/Rmagic: can't read csv data file

2 Answers2