0

I have a dataframe(df)as follows:

Year PlotNo HabitatType Sp1 Sp2 Sp3 Sp4
2000   1       GH        0   1   2   3
1988   3       KL        2   3   4   5

where, Sp stands for Species and its columns represent abundance value.

I'm trying to find the Simpson's diversity for each row in the dataframe. I have attempted the following for loop:

require(vegan)
y <- for(i in 1:nrow(df)) {
row <- df[i,4:50] #Assuming 50 columns
diversity(row, "simp")
}

However, I keep running into an error as follows :

Error in sum(x) : invalid 'type' (character) of argument

Any ideas on how to correct this error? Or any alternate way of going about this?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
biogeek
  • 197
  • 3
  • 8
  • 1
    as a side note: don't call a data.frame `df`. That is the name of a function in R and will lead to confusing error messges in case of a syntax error. – Bernhard Feb 14 '17 at 11:28
  • And could you make your example reproducible? Without that we are left guessing what is going wrong exactly. – Paul Hiemstra Feb 14 '17 at 11:55

3 Answers3

2

diversity indeed needs numerical data, and this may be your problem. What do you get from sum(df[,4:50])?

Another issue is that you do not need a for() loop: when given a data frame or a matrix, diversity will calculate the index for each row (or column if you set argument MARGIN = 2). So diversity(df[,4:50]) should do, provided that your data are numeric.

Jari Oksanen
  • 3,287
  • 1
  • 11
  • 15
  • Aha, I did not realise that. I thought I'd have to loop diversity across each row to get the result. That solved it! – biogeek Feb 14 '17 at 12:29
1

We can use:

library(data.table)
mydf <- setDF(mydf)
res <- mydf[, div := diversity(mydf[, 4:7], 'simp')]

This add a colum div with the result of the diversity function for each row.

GGamba
  • 13,140
  • 3
  • 38
  • 47
0

The diversity function can only deal with numerical data. Probably, df[i,4:50] contain non-numeric elements. My guess is that some of the columns are character or factor. However, without a reproducible example I can't confirm that this is the case.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • I checked this using str(df). The year and plot_id are int, plot_type is a factor and the rest of the columns are num, as should be. When I do summary(df) all the columns from 4:50 have a mean/median etc. Also, I looked at only between 4:7 now and the output says NULL – biogeek Feb 14 '17 at 12:21
  • The output of your original command will always be `NULL`, since you have `y <- for() {}` and `for()`returns `NULL`. If you want to collect the result in a loop, you must do assignment within the loop: `y <- numeric(nrow(df)); for(i in 1:nrow(df)) {...; y[i] <- diversity(row, "simp")}`. However, with `diversity` you don't need `for()`. – Jari Oksanen Feb 14 '17 at 12:29