Subset by samples for an ExpressionSet object

Question

I have an ExpressionSet object with 100 samples:

> length(sampleNames(eset1))
100

I also have a vector of the names of 75 samples (not the data itself):

> length(vecOf75)
75

How can I subset eset1 (and save it) according to the 75 sample names? That is, I want to disregard those samples in eset1 whose names are not listed in vecOf75. Bear in mind that some of the samples corresponding to the 75 sample names may not be in eset1. Thus,

> length(sampleNames(eset1))

should now give something <75.

score 9 · Accepted Answer · answered Feb 10 '12 at 20:41

9

An ExpressionSet can be subset like a matrix, so maybe

eset2 = eset1[, sampleNames(eset1) %in% vecOf75]

or if all(vecOf75 %in% sampleNames(eset1)) then just

eset1[, vecOf75]

Not sure what 'save' means; either save(eset2, "some_file.rda") or extracting the components exprs(eset2), pData(eset2) etc., and using write.table and other standard R functions.

answered Feb 10 '12 at 20:41

Martin Morgan

45,935
7
84
112

By 'save', I meant to update `eset1` with its subset rather than create a new RData object such as `eset2`. Thanks. – user1202664 Feb 10 '12 at 21:30

Brandon Bertelsen · Answer 2 · 2012-02-10T18:50:03.510

2

eset1 <- vecOf75[vecOf75 %in% eset1] This says, save to eset1 those of vecOf75 where vecOf75 is in eset1

A trivial example using numbers:

eset1 <- sample(1:100)
vecOf75 <- sample(1:200,75)
eset1 <- vecOf75[vecOf75 %in% eset1]

Alternatively, you could use subset() but, getting used to subsetting via ']' is much more useful programmatically.

subset(vecOf75, vecOf75 %in% eset1)

edited Feb 10 '12 at 18:50

answered Feb 10 '12 at 18:44

Brandon Bertelsen

43,807
34
160
255

That won't work with ExpressionSet objects (Bioconductor package). – user1202664 Feb 10 '12 at 19:10
If you look at the object in more depth `str(obj)`, I'm sure you can find where the character names are stored and adjust the code above with a few vecof75[vecOf75$storage %in% eset1$storage] or something of that nature. – Brandon Bertelsen Feb 10 '12 at 19:14
Also, `vecOf75` only contains the names of the columns (or samples for an ExpressionSet object). It does not contain the actual data. – user1202664 Feb 10 '12 at 20:02
Sorry, I'm not familiar with the object. However, the method is the same. Inside each of those variables is some set of elements that you want to compare. You have to delve into the object and decide which variable you want to subset by. The method x <- x[x %in% y] will remain the same. – Brandon Bertelsen Feb 10 '12 at 21:56
Pretty much what they've done here, as an example: http://stackoverflow.com/questions/5593931/expressionset-subsetting – Brandon Bertelsen Feb 10 '12 at 22:01

Subset by samples for an ExpressionSet object

2 Answers2