-1

I am trying to run some R code from within a python script. To do this I am using rpy2 but having difficulty (could also just call an R script but I can't get that to work either). Below is the R script code that does what I want it to do:

library(ggplot2)
setwd("/dir/")
allelefreqshort <- read.table("allelefreqsshort.txt", header = TRUE)
hist(log10(allelefreqshort$AlleleFreq), xlim = c(-15,0), breaks=20)

This is my rpy2 code, and it plots the data but not as log10 and also the x axis is too small.

import rpy2.robjects as ro
from rpy2.robjects.packages import importr
r = ro.r
outputDir = '/dir'
r.setwd(outputDir)
f = r('read.table("allelefreqs.txt", header = FALSE)')
grdevices = importr('grDevices')
grdevices.png(file="alleleFreq.png", width=800, height=500)
r.hist(f[0], breaks=100, main = '5 Reads', xlab='Variant Freq', ylab='# Vars', log10='x')
grdevices.dev_off()
The Nightman
  • 5,609
  • 13
  • 41
  • 74

2 Answers2

1

Here f is an R data frame, which essentially means an R list in which all elements are R vectors (each one is a "column" in your table) and all these vectors have the same length.

Doing f[0] will return a list of length 1, because this is what R would do (R has [ and [[ - to this day I am not sure about whether [ on the Python side should behave like R's [ or [[ but that for an other thread). f.rx2(1) would return what you want (see the doc for rpy2 and data frames at http://rpy2.readthedocs.org/en/version_2.7.x/vector.html#extracting-elements and mind that R sequences are 1-offset while Python vectors are 0-offset).

lgautier
  • 11,363
  • 29
  • 42
1

You can keep the R commands pretty much intact but handle the objects between Python differently. Consider the following rpy2 and Rscript command line solutions:

RPY2

import os
import rpy2
import rpy2.robjects as ro
from rpy2.robjects.packages import importr

# CURRENT DIRECTORY OF SCRIPT
cd = os.path.dirname(os.path.abspath(__file__))

# READ IN DATA
allelefreqshort_py = ro.r['read.table'](os.path.join(cd, "allelefreqs.txt"), header=False)

# PASSING PYTHON DF TO R DF
ro.globalenv['allelefreqshort'] = allelefreqshort_py

# OUTPUT PLOT
grdevices = importr('grDevices')
grdevices.png(file="alleleFreq.png", width=800, height=500)
p = ro.r('hist(log10(allelefreqshort$AlleleFreq), xlim = c(-15,0), breaks=20)')
grdevices.dev_off()

RScript

Alternatively, you can run a subprocess and call the R script via command line using R's automated executable RScript.exe. You can even pass arguments into R script for R to use with commandArgs().

import subprocess

# CURRENT DIRECTORY OF SCRIPT (ASSUMING R SCRIPT IN SAME DIRECTORY)
cd = os.path.dirname(os.path.abspath(__file__))

# COMMAND LINE ARGUMENTS (IF RSCRIPT.EXE IS PATH VARIABLE, LEAVE OUT DIRECTORY)
cmd = ["path/to/RScript", os.path.join(cd, "HistPlotScriptName.R")]

# SUBPROCESS CALL
a = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE,
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output,error = a.communicate()

# R CONSOLE OUTPUT PRINTED TO PYTHON CONSOLE
if a.returncode == 0:               # SUBPROCESS SUCCESSFUL
    print(output.decode("utf-8"))
else:                               # SUBPROCESS FAILED
    print(error.decode("utf-8"))
Parfait
  • 104,375
  • 17
  • 94
  • 125