7

I have a dataframe in rpy2 in python and I want to pull out columns from it. What is the rpy2 equivalent of this R code?

df[,c("colA", "colC")]

this works to get the first column:

mydf.rx(1)

but how can I pull a set of columns, e.g. the 1st, 3rd and 5th?

mydf.rx([1,3,5])

does not work. neither does:

mydf.rx(rpy2.robjects.r.c([1,3,5]))

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
lgd
  • 1,472
  • 5
  • 17
  • 35

3 Answers3

5

Alternatively, you can pass the R data frame into a Python pandas data frame and subset your resulting 1, 3, 5 columns:

#!/usr/bin/python
import rpy2
import rpy2.robjects as ro
import pandas as pd
import pandas.rpy.common as com

# SOURCE R SCRIPT INSIDE PYTHON 
ro.r.source('C:\\Path\To\R script.R') 

# DEFINE PYTHON DF AS R DF
pydf = com.load_data('rdf')
cols = pydf[[1,3,5]]
Parfait
  • 104,375
  • 17
  • 94
  • 125
3

I think the answer is:

# cols to select
c = rpy2.robjects.IntVector((1,3))
# selection from df
mydf.rx(True, c)
lgd
  • 1,472
  • 5
  • 17
  • 35
1

The best possible way that I found is by doing this simple thing:

from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
import rpy2.robjects as robjects

dataframe = robjects.r('data.frame')
df_rpy2 = dataframe([1,2,],[5,6])
df_pd = pd.DataFrame({'A': [1,2], 'B': [5,6]})

base = importr('base') #Creates an instance of R's base package 
pandas2ri.activate() #Converts any pandas dataframe to R equivalent

base.colnames(df_pd) #Finds the column names of the dataframe df_pd 
base.colnames(df_rpy2) #Finds the column names of the dataframe df_rpy2

The output is:

R object with classes: ('character',) mapped to:
<StrVector - Python:0x7fa3504d3048 / R:0x10f65ac0>
['X1L', 'X2L', 'X5L', 'X6L']

R object with classes: ('character',) mapped to:
<StrVector - Python:0x7fa352493548 / R:0x103b6e40>
['A', 'B']

This works for both the dataframes created using pandas & rpy2. Hope this helps!