6

I'm trying to import a data frame from R saved as RData to a pandas data frame. How can I do so? I unsuccessfully tried using rpy2 as follows:

import pandas as pd
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
pandas2ri.activate()

# I use iris for convenience but I could have done r.load('my_data.RData')
print(r.data('iris'))
print(r['iris'].head())
print(type(r.data('iris')))

print(pandas2ri.ri2py_dataframe(r.data('iris')))
print(pandas2ri.ri2py(r.data('iris')))
print(pd.DataFrame(r.data('iris')))

outputs:

[1] "iris"

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
1           5.1          3.5           1.4          0.2  setosa
2           4.9          3.0           1.4          0.2  setosa
3           4.7          3.2           1.3          0.2  setosa
4           4.6          3.1           1.5          0.2  setosa
5           5.0          3.6           1.4          0.2  setosa
<class 'rpy2.robjects.vectors.StrVector'>
   0  1  2  3
0  i  r  i  s
['iris']

I use pandas 0.20.1 + python 3.6 x64 + Windows 7.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • can perhaps use `rpy2` to open the `Rdata` and save each object as `csv`? – s_baldur Jul 20 '17 at 19:40
  • What is the error or undesired results? You may be showing it but not explaining. Do note `r.data() != .RData`. The former is a function that pulls in R's built-in datasets like iris, mtcars, etc. and the other is a binary file type. – Parfait Jul 20 '17 at 19:52
  • @snoram I would prefer to avoid csv and stay in binary formats. The best alternative outside py2 that I'm aware of is feather. – Franck Dernoncourt Jul 20 '17 at 20:09
  • 1
    @Parfait I would like a pandas data frame with the same content and column names as the R data frame. – Franck Dernoncourt Jul 20 '17 at 20:10
  • Again, *I unsuccessfully tried using rpy2* is not specific ... what is unsuccessful? What's wrong with first print? Is the type not correct? – Parfait Jul 20 '17 at 21:33

1 Answers1

3

Generalized conversion of data frames turns out to be an expensive operation as a copying is required for some types of columns. A local conversion rule could be better:

from rpy2.robjects import pandas2ri
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import localconverter

print(r.data('iris'))
with localconverter(default_converter + pandas2ri.converter) as cv:
    pd_iris = r('iris')
# this is a pandas DataFrame
pd_iris

Otherwise, the following is "just working" on this end (Linux, head of branch default for rpy2):

import pandas as pd
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
pandas2ri.activate()

pd_iris = r('iris')
pd_iris

If it does not for you, there might be an issue with rpy2 on Windows (yet an other one - rpy2 is not fully supported on Windows).

lgautier
  • 11,363
  • 29
  • 42