14

I have a site by species matrix. The dimensions are 375 x 360. Each value represents the frequency of a species in samples of that site.

I am trying to convert this matrix from frequencies to relative abundances at each site.

I've tried a few ways to achieve this and the only one that has worked is using a for loop. However, this takes an incredibly long time or simply never finishes.

Is there a function or a vectorised method of achieving this? I've included my for-loop as an example of what I am trying to do.

relative_abundance <- matrix(0, nrow= nrow(data_wide),
ncol=ncol(data), dimnames = dimnames(data))

i=0
j=0

for(i in 1:nrow(relative_abundance)){
  for(j in 1:ncol(relative_abundance)){
    species_freq <- data[i,j]
    row_sum <- sum(data[i,])
    relative_abundance[i,j] <- species_freq/row_sum
 }
}
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
Zane.Lazare
  • 155
  • 1
  • 1
  • 4

4 Answers4

11

You could do this using apply, but scale in this case makes things even simplier. Assuming you want to divide columns by their sums:

set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)

freqs <- scale(relative_abundance, center = FALSE, 
               scale = colSums(relative_abundance))

The matrix is too big to output here, but here's how it shoud look like:

> head(freqs[, 1:5])
            [,1]         [,2]        [,3]        [,4]         [,5]
[1,] 0.004409603 0.0014231499 0.003439803 0.004052685 0.0024026910
[2,] 0.001469868 0.0023719165 0.002457002 0.005065856 0.0004805382
[3,] 0.001959824 0.0018975332 0.004914005 0.001519757 0.0043248438
[4,] 0.002939735 0.0042694497 0.002948403 0.002532928 0.0009610764
[5,] 0.004899559 0.0009487666 0.000982801 0.001519757 0.0028832292
[6,] 0.001469868 0.0023719165 0.002457002 0.002026342 0.0009610764

And a sanity check:

> head(colSums(freqs))
[1] 1 1 1 1 1 1

Using apply:

freqs2 <- apply(relative_abundance, 2, function(i) i/sum(i))

This has the advatange of being easly changed to run by rows, but the results will be joined as columns anyway, so you'd have to transpose it.

Molx
  • 6,816
  • 2
  • 31
  • 47
  • This works like a charm, but I do not understand how `apply()` knows how to divide each cell '`i`' by the sum of the row . Because in my logic `i` references a cell and thus `sum(i)` would return the sum of the cell rather than the sum of the row. – Zane.Lazare Mar 01 '16 at 18:51
  • @Zane.Lazare `apply` works on rows or columns, depending on the second argument, which I set to `2`, so columns. `i` represents a column, so the whole column will be divided by it's sum, and a vector is returned. `apply` automatically binds the columns back together, so you get a matrix as an output. – Molx Mar 01 '16 at 22:54
  • `sweep` is usually canon for this – MichaelChirico Feb 10 '18 at 18:38
3

Using some simple linear algebra we can produce faster results. Simply multiply on the left by a diagonal matrix with the scaling factors you need, like this:

library(Matrix)
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
Diagonal_Matrix <- diag(1/rowSums(relative_abundance))

And then we multiply from the left:

row_normalized_matrix <- Diagonal_Matrix %*% relative_abundance

If you want to normalize columnwise simply make:

Diagonal_Matrix <- diag(1/colSums(relative_abundance))

and multiply from the right.

2

Firstly, you could just do

relative_abundance[i,j] <- data[i,j]/sum(data[i,])

so you dont create the variables...

But to vectorise it, I suggest: compute the row sums with rowsum function(fast) and then you can just use apply by columns and each of that divide by the rowsums:

 relative_freq<-apply(data,2,function(x) data[,x]/rowsum(data)) 
Jan Sila
  • 1,554
  • 3
  • 17
  • 36
  • 1
    `relative_freq<-apply(data,2,function(x) data[,x]/rowsum(data))` does not work as it `rowsum(data)` throws the error `"Error in rowsum.default(data) : argument "group" is missing, with no default".` If I define `grouping = row.names()`, and set `reorder = FALSE`, I end up with another error `"Error in Ops.data.frame(data[, x], rowsum(data, group = row.names(data), : ‘/’ only defined for equally-sized data frames"` – Zane.Lazare Mar 01 '16 at 18:31
1

You can do something like this

relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
datnorm <- relative_abundance/rowSums(relative_abundance) 

this will be faster if relative_abundance is a matrix rather than a data.frame

Sarwan Ali
  • 151
  • 1
  • 1
  • 11