0

I earlier asked "How to display two columns as binary (presence/absence) matrix?". This question received two excellent answers. I would now like to take this a step further and add a third column to the original site by species columns which reflects the biomass of each species in each plot.

Column 1 (plot) specifies code for ~ 200 plots, column 2 (species) specifies code for ~ 1200 species and Column 3 (biomass) specifies the dryweight. Each plot has > 1 species and each species can occur in > 1 plot. The total number of rows is ~ 2700.

> head(dissim)
    plot species biomass
1 a1f56r  jactom 20.2
2 a1f56r  zinunk 10.3
3 a1f56r  mikcor 0.4
4 a1f56r  rubcle 1.3
5 a1f56r  sphoos 12.4
6 a1f56r nepbis1 8.2

tail(dissim)
           plot species biomass
2707 og100m562r  selcup 4.7
2708 og100m562r  pip139 30.5
2709 og100m562r  stasum 0.1
2710 og100m562r  artani 3.4
2711 og100m562r  annunk 20.7
2712 og100m562r  rubunk 22.6

I would like to create a plot by species matrix that displays the biomass of each species in each plot (rather than a binary presence/absence matrix), something of the form:

    jactom  rubcle  chrodo  uncgla
a1f56r  1.3 0   10.3    0
a1f17r  0   22.3    0   4
a1m5r   3.2 0   3.7 9.7
a1m5r   1   0   0   20.1
a1m17r  5.4 6.9 0   1

Any advice on how to add this additional level of complexity would be very much appreciated.

tabtimm
  • 411
  • 3
  • 6
  • 17
  • use: `reshape::cast(dissim, plot ~ species, value = 'biomass', fun = mean)` as in here: http://stackoverflow.com/questions/6798327/calculating-the-mean-of-values-in-tables-using-formulae-r – Tim Dec 09 '14 at 14:53
  • In the future consider starting a new question if you have additional questions. However, in this case searching the archive would be enough. – Tim Dec 09 '14 at 17:02
  • Would there be an alternative option using with() and table() on the 3-column table as below for the 2-column site by species table? matrixBiom <- with(data, table(plot,species)) – tabtimm Dec 09 '14 at 17:55
  • Yes but only for counts, not for means. – Tim Dec 09 '14 at 18:02
  • Reason for asking is that currently I can only apply the function'metaMDS' error-free to a matrix (based on 2-column table) generated by with() and table(), when I use tapply or reshape I get this error 'Error in if (any(autotransform, noshare > 0, wascores) && any(comm < 0)) { : missing value where TRUE/FALSE needed' – tabtimm Dec 09 '14 at 18:14
  • Check out my answer below. – Tim Dec 09 '14 at 18:15

3 Answers3

4

With sample data

set.seed(15)
dd<-data.frame(
    a=sample(letters[1:5], 30, replace=T),
    b=sample(letters[6:10], 30, replace=T)
)

if you know each occurrence only appears once you can do

with(dd, table(a,b))

#    b
# a   f g h i j
#   a 0 1 0 2 3
#   b 0 0 2 1 0
#   c 0 3 0 0 1
#   d 2 2 2 1 1
#   e 1 1 2 4 1

if they are potentially duplicated, and you only want to track presence/absence, you can do

with(unique(dd), table(a,b))
# or 
with(dd, (table(a,b)>0)+0)

#    b
# a   f g h i j
#   a 0 1 0 1 1
#   b 0 0 1 1 0
#   c 0 1 0 0 1
#   d 1 1 1 1 1
#   e 1 1 1 1 1
MrFlick
  • 195,160
  • 17
  • 277
  • 295
4

The xtabs and tapply functions return a table which is a matrix:

# Using MrFlick's example
> xtabs(~a+b,dd)
   b
a   f g h i j
  a 0 1 0 2 3
  b 0 0 2 1 0
  c 0 3 0 0 1
  d 2 2 2 1 1
  e 1 1 2 4 1

# --- the tapply solution is a bit less elegant
> dd$one=1
> with(dd, tapply(one, list(a,b), sum))
   f  g  h  i  j
a NA  1 NA  2  3
b NA NA  2  1 NA
c NA  3 NA NA  1
d  2  2  2  1  1
e  1  1  2  4  1

# If you want to make the NA's become zeros then:

> tbl <- with(dd, tapply(one, list(a,b), sum))
> tbl[is.na(tbl)] <- 0
> tbl
  f g h i j
a 0 1 0 2 3
b 0 0 2 1 0
c 0 3 0 0 1
d 2 2 2 1 1
e 1 1 2 4 1
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

You asked also about a solution when there are three variables. Below I provide two solutions that you asked for.

First, let's set up the data the data:

set.seed(15)
dd<-data.frame(
  a=sample(letters[1:5], 30, replace=T),
  b=sample(letters[6:10], 30, replace=T),
  c=sample(letters[1:3], 30, replace=T)
)

If you have three discrete variables and want only to count the occurrences, here you have a version of solution by @MrFlick:

by(dd, dd$c, function(x) with(x, table(a, b)))

And if you want average values of the third variable you can use this solution:

reshape::cast(dd, a ~ b, value = 'c', fun = mean)
Community
  • 1
  • 1
Tim
  • 7,075
  • 6
  • 29
  • 58
  • That works great, thanks. However, the next step of conducting an NMDS using function metaMDS on the matrix results in an error message, I have therefore posted an additional question http://stackoverflow.com/questions/27392653/how-to-use-dissimilarity-matrix-with-function-metamds – tabtimm Dec 10 '14 at 03:09
  • @tabtimm take a tour (http://stackoverflow.com/tour) on how SO works - it would be helpful in getting the best of the site and community. Remember to vote up or vote down if you find an answer helpful or misleading and to accept an answer that you find that answers your question so that others know that the issue is resolved. – Tim Dec 10 '14 at 07:27