In R, I have a large dataframe (23344row x 89 col) with sampling locations and entries.
value 1 means: object found in this sampling location value 0 means: object not found this sampling location
To calculate degrees/connections per sampling location (node) I want to, per row, get the rowsum-1
(as this equals number of degrees) and change the 1s in that row to that value.
Thereafter I can get the colSum()
to calculate total degrees per sample location.
A reproducible example of my dataframe:
loc1 <- c(1,0,1)
loc2 <- c(0,1,1)
loc3 <- c(1,1,0)
loc4 <- c(1,1,0)
loc5 <- c(0,1,0)
df <- data.frame(loc1, loc2, loc3, loc4, loc5)
# loc1 loc2 loc3 loc4 loc5
# 1 1 0 1 1 0
# 2 0 1 1 1 1
# 3 1 1 0 0 0
Desired output looks like this
# loc1 loc2 loc3 loc4 loc5
# 1 2 0 2 2 0 #rowsum = 3 so change values>1 to 2
# 2 0 3 3 3 3 #rowsum = 4 so change values>1 to 3
# 3 1 1 0 0 0 #rowsum = 2 so change/keep values>1 to 1
I have code that works but it's slow (contains for loop) so is there a better/faster way to do this? I'm aware of the function rowSums()
which may be a part of the solution.
My current code is as follows:
for (r in 1:nrow(df)){
df[r, df[r,] == 1] <- sum(df[r,]) - 1}
degrees_per_sample <- colSums(df)