I am using the rpy2 package to bring some R functionality to python. The functions I'm using in R need a data.frame object, and by using rlike.TaggedList and then robjects.DataFrame I am able to make this work.
However I'm having performance issues, when comparing to the exact same R functions with the exact same data, which led me to try and use the rpy2 low level interface as mentioned here - http://rpy.sourceforge.net/rpy2/doc-2.3/html/performances.html
So far I have tried:
- Using TaggedList with FloatSexpVector objects instead of numpy arrays, and the DataFrame object.
Dumping the TaggedList and DataFrame classes by using a dictionary like this:
d = dict((var_name, var_sexp_vector) for ...) dataframe = robjects.r('data.frame')(**d)
Both did not get me any noticeable speedup.
I have noticed that DataFrame objects can get a rinterface.SexpVector in their constructor , so I have thought of creating a such a named vector, but I have no idea on how to put in the names (in R I know its just names(vec) = c('a','b'...)).
How do I do that? Is there another way? And is there an easy way to profile rpy itself, so I could know where the bottleneck is?
EDIT:
The following code seem to work great (x4 faster) on newer rpy (2.2.3)
data = ro.r('list')([ri.FloatSexpVector(x) for x in vectors])[0]
data.names = ri.StrSexpVector(vector_names)
However it doesn't on version 2.0.8 (last one supported by windows), since R cant seem to be able to use the names: "Error in eval(expr, envir, enclos) : object 'y' not found"
Ideas?
EDIT #2: Someone did the fine job of building a rpy2.3 binary for windows (python 2.7), the mentioned works great with it (almost x6 faster for my code)
link: https://bitbucket.org/breisfeld/rpy2_w32_fix/issue/1/binary-installer-for-win32