0

I would like to run this code in a generalised way. I have TN type variables in columns from 12 to 25 in my df. I would like to keep all my data and calculate the sum of the valid chr type of values of TN type variables by rows in a new variable called sumTN. Where the data is missing in these variables I have NA.

df$sumTN <- as.vector(rowSums(!is.na(df[, c(12:25)])))

I would like to have this code for other datasets where the TN type variables (which I would like to get a sum of them by rows) could be in different columns with different ranges.

I tried two different ways to get this but both cases I received a message that "'x' must be an array of at least two dimensions". I understand why but I cannot figure out then how I could solve this problem. Here are the codes that I tried:

firstcol = which(colnames(df)=="TN_1")
lastcol = which(colnames(df)=="TN_14")
df$sumTN <- as.vector(rowSums(!is.na(df[, c(firstcol:lastcol)])))

df$sumTN <- as.vector(rowSums(!is.na(df[, c(grep("^TN_[0-9]+$", colnames(df)))])))

Any solution would be appreciated, thanks.

Malna
  • 59
  • 6
  • does `grep("^TN_[0-9]+$", colnames(df))` return a valid result ? – R.S. Jun 14 '19 at 14:23
  • No because it returns an int [1:31] and that is not valid here, inside the code. Same with the other code. Probably I should try to approach from a completely different aspect but no idea how. – Malna Jun 14 '19 at 14:42

2 Answers2

0

That's what I thought. The code looked fine but for some dataframe it must be returning a single column. These are converted to vectors. What you can do is use drop=FALSE to suppress this behavior. Also, do the subsetting on the isna Dataframe.

Try this

rowSums( (!is.na(df))[, c(grep("^TN_[0-9]+$", colnames(df))), drop=FALSE] )
R.S.
  • 2,093
  • 14
  • 29
0

Finally I figured out how to solve the problem. I have to use library(dplyr) and then the code is:

df$sumTN <- as.vector(rowSums(!is.na(select_if(df, grepl("^TN_[0-9]+$", colnames(df))==T))))
Malna
  • 59
  • 6
  • Nice to know it sorted out. Though I could not really grasp what was going on there. You would probably need to make sure it can handle the situation when that select_if returns nothing (it will return an empty dataframe for such cases as far as I know) – R.S. Jun 17 '19 at 12:15
  • I wanted to generalise this call: df[, c(12-25)] so I could use the code for another dataset where the TN type variables' places (in which columns they are) are unknown. The code grepl("^TN_[0-9]+$", colnames(df) returnes a logical vector, so I figured out I need to use select_if which works with logical vectors. – Malna Jun 17 '19 at 13:53
  • 1
    ok. I actually learnt about dplyr's `select_if` from you :-) Thanks for that. BTW, I guess `==T` can certainly be dropped . It is already True, so `==True` is not giving anything new. – R.S. Jun 17 '19 at 17:16