Reshape very large OTU (abundance) table in wide to long format - 400,000 observations

Question

I have a very large OTU (abundance) table. There are over 100 samples and 4000 observations per sample (4000 taxa).

An example of the OTU table is here:

#OTUID  1   2   3   4   5   6   7   8
OTU1    0   0   0   0   0   3   0   0
OTU2    0   0   0   0   0   0   13  0
OTU3    5   99  0   0   0   0   0   0
OTU4    0   0   0   0   0   0   0   0
OTU5    0   0   0   0   0   0   0   2
OTU6    0   0   19  0   9   236 59  2
OTU7    0   55  0   2   4   2   3   0
OTU8    0   44  10  5   0   0   7   0
OTU9    6   0   13  2   0   0   17  6
OTU10   0   100 0   0   0   3   0   0
OTU11   4   13  0   0   2   1   2   0
OTU12   0   0   0   0   0   101 1   0

I would like to get this table in a long format so I can perform some pair wise tests between samples on another table. I am only interested in the count data, although if I could have the samples they belong two and the respective OTU I'll take it but it is not necessary. The data should look like this:

COUNT OTUID SAMPLEID
0     OTU1   1
0     OTU2   1
5     OTU3   1
0     OTU4   1
0     OTU5   1
0     OTU6   1
0     OTU7   1
0     OTU8   1
6     OTU9   1
0     OTU10  1
4     OTU11  1
0     OTU12  1
0     OTU1   2
0     OTU2   2
99    OTU3   2
0     OTU4   2

My script seems to work although I do get the NO id variable error message it still runs. If anyone has any idea how to fix that I would greatly appreciate it.

library(reshape2)
test = read.csv("test_otu.csv", sep=",", row.names=1)
test2 <- melt(test)
No ID variables; using all as measure variables
test2

Please help!

Maybe because you need to set an ID variable? This is not an error message rather a warning that tells you what you probably should do. Try maybe `melt(test, 1)`. Also, I can't reproduce this. When running your code it sets `OTUID` as an id by itself. Please read `?melt.data.frame` — David Arenburg, Aug 17 '16 at 21:24
I don't quite get what is wrong. You say you get a warning message. That looks like it is just to inform you that you have no id variable. There doesn't seem to be a problem here. — polka, Aug 17 '16 at 21:24
Actually how do I get the reshape function to also spit out the OTUIDs? The output spits out two columns, 1) variable which is just the sampleIDs and the other column is the values or the observed abundances. — user3105519, Aug 17 '16 at 21:41
If you keep the OTUIDs in your dataset when you read it in rather than making them row names you should get your desired output. If they must be row names for some reason, then `melt(as.matrix(test))` — aosmith, Aug 17 '16 at 22:31
`melt(test, 1)` gives the OTUIDs too simply because OTUIDs are the ids — David Arenburg, Aug 18 '16 at 08:57

Reshape very large OTU (abundance) table in wide to long format - 400,000 observations

0 Answers0