6

I am trying to run DEseq2 from Python using rpy2. How should I pass the design matrix? My script is as follows:

from numpy import *
from numpy.random import multinomial, random
from rpy2 import robjects
import rpy2.robjects.numpy2ri
robjects.numpy2ri.activate()
from rpy2.robjects.packages import importr
deseq = importr('DESeq2')

# Generate some data. 1000 genes, 10 samples
n = 1000
probabilities = random(n)
probabilities /= sum(probabilities)
data = zeros((n,10), int)
for i in range(10):
    data[:,i] = multinomial(1000000, probabilities)

# Make the data frame
d = {}
categories = ('1','2') * 5
d["key_1"] = robjects.IntVector(categories)
dataframe = robjects.DataFrame(d)

# Create the design matrix, and run DESeqDataSetFromMatrix
design = "~ key_1" # <--- I guess this is wrong
dds = deseq.DESeqDataSetFromMatrix(countData=data, colData=dataframe,design=design)

The error I am getting is

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2-2.8.5-py3.6-macosx-10.11-x86_64.egg/rpy2/rinterface/__init__.py:186: RRuntimeWarning: Error: $ operator is invalid for atomic vectors

warnings.warn(x, RRuntimeWarning)
Traceback (most recent call last):
File "testrpy.py", line 23, in <module>
dds = deseq.DESeqDataSetFromMatrix(countData=data, colData=dataf,design=design)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2-2.8.5-py3.6-macosx-10.11-x86_64.egg/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rpy2-2.8.5-py3.6-macosx-10.11-x86_64.egg/rpy2/robjects/functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error: $ operator is invalid for atomic vectors

My guess is that the design argument is not correct. Does anybody have an example of running DEseq via rpy2?

Thanks.

1 Answers1

5

Ah ! You were almost there:

# Create the design matrix, and run DESeqDataSetFromMatrix
design = "~ key_1" # <--- I guess this is wrong

design is a string, but I guess that it should be a formula. Formulae are language objects in R.

Try with:

from rpy2.robjects import Formula
design = Formula("~ key_1")
lgautier
  • 11,363
  • 29
  • 42