0

I am trying to use rpy2 to let me use some r functionality in python. Here is a simple regression I want to do. I create a data frame, convert it to R data frame and then try using R's lm. But the R data frame cannot be found (see below). Where should I look to troubleshoot?

FYI I am using python 2.7.3, rpy2-2.3.2, pandas version '0.10.1' and R2.15.3

>>> import rpy2
>>> import pandas as pd
>>> import pandas.rpy.common as com
>>> datframe = pd.DataFrame({'a' : [1, 2, 3], 'b' : [3, 4, 5]})
>>> r_df = com.convert_to_r_dataframe(datframe)
>>> r_df     
(DataFrame - Python:0x32547e8 / R:0x345d640)
[IntVector, IntVector]
  a: (class 'rpy2.robjects.vectors.IntVector')
  (IntVector - Python:0x3254e18 / R:0x345d608)
[       1,        2,        3]
  b: (class 'rpy2.robjects.vectors.IntVector')
  (IntVector - Python:0x3254e60 / R:0x345d5d0)
[       3,        4,        5]
>>> print type(r_df)
(class 'rpy2.robjects.vectors.DataFrame')
>>> from rpy2.robjects import r
>>> r('lmout <- lm(r_df$a ~ r_df$b)')

Error in eval(expr, envir, enclos) : object 'r_df' not found
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    r('lmout <- lm(r_df$a ~ r_df$b)')
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/__init__.py", line 236, in __call__
    res = self.eval(p)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 86, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/rpy2/robjects/functions.py", line 35, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
RRuntimeError: Error in eval(expr, envir, enclos) : object 'r_df' not found
Danica
  • 28,423
  • 6
  • 90
  • 122
user2133151
  • 247
  • 1
  • 2
  • 10

3 Answers3

2

When calling

r('lmout <- lm(r_df$a ~ r_df$b)')

the embedded R will look for a variable r_df, and no such variable is made visible to R in your code example.

When doing

r_df = com.convert_to_r_dataframe(datframe)

you are creating the variable r_df on the Python side but while the actual data in now in R, there is no symbol (name) associated with it known to R. That data structure remains anonymous. (btw, you may want to use the automagic conversion of pandas data frames shipping with rpy2-2.3.3).

To create a variable name in R's "global environment", add this:

from rpy2.robjects import globalenv
globalenv['r_df'] = r_df

Now your lm() call should work.

lgautier
  • 11,363
  • 29
  • 42
  • Thank you. I tried this approach as it seemed easiest to implement and it worked. Now working on getting information out of lmout! – user2133151 Mar 06 '13 at 16:31
  • r('print(summary(lmout))') works just fine. print(r('print(summary(lmout))') ) works even better. – user2133151 Mar 06 '13 at 17:02
  • Did you check the documentation for rpy2 ? http://rpy.sourceforge.net/rpy2/doc-dev/html/introduction.html#linear-models – lgautier Mar 06 '13 at 17:02
  • Of course I did :-) (I know you wrote yuor comment just a tad bit before mine appeared...) Thanks. – user2133151 Mar 06 '13 at 17:03
0

try this, (not sure which header do the magic, though....)

import rpy2.robjects as robjects
from rpy2.robjects import DataFrame, Formula
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr


def my_linear_fit_using_r(X,Y,verbose=True):
   # ## FITTINGS:   RPy implementation ###
   r_correlation = robjects.r('function(x,y) cor.test(x,y)')
   # r_quadfit = robjects.r('function(x,y) lm(y~I(x)+I(x^2))')
   r_linfit = robjects.r('function(x,y) lm(y~x)')
   r_get_r2=robjects.r('function(x) summary(x)$r.squared')
   lin=r_linfit(robjects.FloatVector(X),robjects.FloatVector(Y))
   coef_lin=robjects.r.coef(lin)
   a=coef_lin[0]
   b=coef_lin[1]
   r2=r_get_r2(lin)
   ci=robjects.r.confint(lin) # confidence intervals
   lwr_a=ci[0]
   lwr_b=ci[1]
   upr_a=ci[2]
   upr_b=ci[3]
   if verbose:
      print robjects.r.summary(lin)
      # print robjects.r.summary(quad)
   return (a,b,r2[0],lwr_a,upr_a,lwr_b,upr_b)
  • Thank you for your suggestion. It looks worth trying but I went the short and easy lazy route outlined in the next suggestion. – user2133151 Mar 06 '13 at 16:32
0

Just a remark, for simple regressions you can do it completely in Python, use ols from statsmodels:

from statsmodels.formula.api import ols

lmout = ols('a ~ b', datframe).fit()
lmout.summary()
herrfz
  • 4,814
  • 4
  • 26
  • 37