I am having some issues trying to sum a bunch of columns in R. I am analyzing a huge dataset so I am reproducing a sample. of fake data.
Here's how the data looks like (I have 800 columns).
library(data.table)
dataset <- data.table(name = c("A", "B", "C", "D"), a1 = 1:4, a2 = c(1,2,NaN,5), a3 = 1:4, a4 = 1:4, a5 = c(1,2,NA,5), a6 = 1:4, a8 = 1:4)
dataset
What I want to do is sum the columns in buckets of 100 columns so, for example, all the values in the first row between the first column and the column 100, all the values in the first row between the column 1 and the column 200, all the values in the second row between the first column and the column 100, etc.
Using the sample data I've come with this solution using rowSums
.
dataset %>%
mutate_if(~!is.numeric(.x), as.numeric) %>%
mutate_all(funs(replace_na(., 0))) %>%
mutate(sum = rowSums(.[,paste("a", 1:3, sep="")])) %>%
mutate(sum1 = rowSums(.[,paste("a", 4:5, sep="")])) %>%
mutate(sum2 = rowSums(.[,paste("a", 6:8, sep="")]))
but I am getting the following error:
Error in `[.data.frame`(., , paste("a", 6:8, sep = "")) : undefined columns selected
as the data does not include column a7.
The original data is missing a bunch of columns between a1 and a800 so solving this would be key to make it work.
What would it be the best way to approach and solve this error?
Also, I have a few more questions regarding the code I've written:
Is there a smarter way to select the column a1 and a100 instead of using this approach
.[,paste("a", 1:3, sep="")]
? I am interested in selected the column by name. I do not want to select it by the position of the column because sometimes a100 does not mean that is the column 100.Also, I am converting the NAs and the NaNs to 0 in order to be able to sum the rows. I am doing it this way
mutate_all(funs(replace_na(., 0)))
, losing my first row than contains the names of the values. What would it be the best way to replace NA and NaN without mutating the string values of the first row to 0?The type of the columns I am adding is integer as I converted them beforehand
mutate_if(~!is.numeric(.x), as.numeric)
. Should I follow the same approach in case I have dbl?
Thank you!