4

I'm trying to run the R Random Forest implementation using Python. I'm using the rpy2 module to get this done easily. Here is a simple example with random generated data:

import numpy as np
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
from rpy2 import robjects as ro 

#create data
X np.random.rand(30,100)
#create y-values
y = np.random.randint(2, size=30)
X = numpy2ri(X)
y = ro.FactorVector(numpy2ri(y))
#build RF
model = rf.randomForest(X, y)

Now, how can I access, from python, all the fields of my model? How can I get the error rate or the variable importance? In r it's really simple:

model$importance[,"MeanDecreaseGini"]

How can be this done using rpy2? How do you access all the fields of my model object?

cel
  • 30,017
  • 18
  • 97
  • 117
Titus Pullo
  • 3,751
  • 15
  • 45
  • 65

2 Answers2

2

You can access the fields with .rx:

>>> model.rx('importance')[0]
  <Matrix - Python:0x1126137e8 / R:0x10a292290>
[0.259480, 0.076463, 0.240162, ..., 0.049585, 0.249498, 0.043696]
elyase
  • 39,479
  • 12
  • 112
  • 119
  • 1
    Thanks. This give me the access only to the values, how can I get a two column matrix, as in r, where the first indicate the feature and the second the value? – Titus Pullo Jan 06 '15 at 16:19
2

Using pandas you can assign column names then using the rpy2 / R interface convert the dataframe to an R object which retains the field names.

import pandas as pd
import rpy2.robjects as robjects
import pandas.rpy.common as com
import numpy as np
r = robjects.r

r.library("randomForest")

# generate a pandas dataframe with random numbers
df = pd.DataFrame(data=np.random.rand(100, 30), columns=["a{}".format(i) for i in range(30)])
df["b"] = np.random.randint(2, size=100)

# create r objects
X = com.convert_to_r_dataframe(df.drop("b", axis=1))
Y = robjects.FactorVector(df.b)

# build rf model
rf = r.randomForest(X, Y)

# print Mean Decrease Gini and Field names
print rf.rx("importance")
print r.dimnames(rf[8])

This returns

randomForest 4.6-7
Type rfNews() to see new features/changes/bug fixes.
$importance
    MeanDecreaseGini
a0          3.264841
a1          1.889741
a2          1.836287
a3          1.397774
a4          2.004300
a5          1.973436
a6          1.282584
a7          1.834799
a8          1.891645
a9          1.607779
a10         1.926996
a11         1.431277
a12         1.605571
a13         2.372562
a14         1.342930
a15         1.596201
a16         1.402425
a17         1.161261
a18         1.423914
a19         1.532494
a20         1.182701
a21         1.328816
a22         1.654255
a23         1.437174
a24         1.312123
a25         1.698160
a26         1.545838
a27         2.169778
a28         1.314767
a29         1.048250

...and the field names as you would see them in R

kungphil
  • 1,759
  • 2
  • 18
  • 27