Generalize column-range in R code for unknown number of column

Question

I would like to run this code in a generalised way. I have TN type variables in columns from 12 to 25 in my df. I would like to keep all my data and calculate the sum of the valid chr type of values of TN type variables by rows in a new variable called sumTN. Where the data is missing in these variables I have NA.

df$sumTN <- as.vector(rowSums(!is.na(df[, c(12:25)])))

I would like to have this code for other datasets where the TN type variables (which I would like to get a sum of them by rows) could be in different columns with different ranges.

I tried two different ways to get this but both cases I received a message that "'x' must be an array of at least two dimensions". I understand why but I cannot figure out then how I could solve this problem. Here are the codes that I tried:

firstcol = which(colnames(df)=="TN_1")
lastcol = which(colnames(df)=="TN_14")
df$sumTN <- as.vector(rowSums(!is.na(df[, c(firstcol:lastcol)])))

df$sumTN <- as.vector(rowSums(!is.na(df[, c(grep("^TN_[0-9]+$", colnames(df)))])))

Any solution would be appreciated, thanks.

does `grep("^TN_[0-9]+$", colnames(df))` return a valid result ? — R.S., Jun 14 '19 at 14:23
No because it returns an int [1:31] and that is not valid here, inside the code. Same with the other code. Probably I should try to approach from a completely different aspect but no idea how. — Malna, Jun 14 '19 at 14:42

score 0 · Answer 1 · answered Jun 14 '19 at 17:19

That's what I thought. The code looked fine but for some dataframe it must be returning a single column. These are converted to vectors. What you can do is use drop=FALSE to suppress this behavior. Also, do the subsetting on the isna Dataframe.

Try this

rowSums( (!is.na(df))[, c(grep("^TN_[0-9]+$", colnames(df))), drop=FALSE] )

score 0 · Accepted Answer · answered Jun 17 '19 at 11:54

0

Finally I figured out how to solve the problem. I have to use library(dplyr) and then the code is:

df$sumTN <- as.vector(rowSums(!is.na(select_if(df, grepl("^TN_[0-9]+$", colnames(df))==T))))

answered Jun 17 '19 at 11:54

Malna

59
6

Nice to know it sorted out. Though I could not really grasp what was going on there. You would probably need to make sure it can handle the situation when that select_if returns nothing (it will return an empty dataframe for such cases as far as I know) – R.S. Jun 17 '19 at 12:15
I wanted to generalise this call: df[, c(12-25)] so I could use the code for another dataset where the TN type variables' places (in which columns they are) are unknown. The code grepl("^TN_[0-9]+$", colnames(df) returnes a logical vector, so I figured out I need to use select_if which works with logical vectors. – Malna Jun 17 '19 at 13:53
1

ok. I actually learnt about dplyr's `select_if` from you :-) Thanks for that. BTW, I guess `==T` can certainly be dropped . It is already True, so `==True` is not giving anything new. – R.S. Jun 17 '19 at 17:16

Generalize column-range in R code for unknown number of column

2 Answers2