0

I'm hoping to get some advice from the community about functions that require a selection of rows and columns. I have a very messy database (real-world data from a central database) and I need to sum subscales for a total score. To make matters more complicated, I have some rows where the total has been provided but no raw data (so no individual data points for each question) and other rows where I have the individual data points and no total. For example:

Q1 Q2 Q3 Q4 Q5 TOTAL
2   3  0  1 NA   3   (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA  NA NA NA  9   (No raw data points, only total scores provided)
1  2   4  2   1  NA  (Raw data points provided, but no total score`

If I tell r to ignore the NAs then it recognises the NA as 0 and provides a total score. However, that means it replaces the total of the 2nd row above to 0 as all the individual data points are NA. I've tried various codes such as apply, rowSum, cbind but I can't seem to find a solution. I basically want to run the following code, or equivalent, but tell r to ignore certain rows. I've been using the following:

rowSums(dat[, c(7, 10, 13)], na.rm=TRUE) (where 7,10, 13 are the column numbers) but if I try and add row numbers (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)) it tells me 'the replacement has 30 rows, data has 1651.' I've also tried rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE but I get an error 'undefined columns selected.'

Is there a way of telling r what rows to include and ignore when you have column conditions? I want a database that sums the individual sub-scores and ignores the rows where they are not provided. I’m very new to r, so a response along the lines of ‘r for dummies’ would be appreciated. Thank you

Laura
  • 11
  • 1
  • the replacement error occurs when you are assigning. i.e. you need to assign on the lhs with the same row index i.e. `data$TOTAL[1:30] <- rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)` – akrun Mar 15 '22 at 16:31
  • Hi, thank you for your response. Unfortunately I'm getting the same output where the NA row totals are being replace by 0 – Laura Mar 15 '22 at 17:05
  • that is because you have `na.rm = TRUE`, which returns 0 when all elements are NA i.e. `sum(c(NA, NA), na.rm = TRUE) [1] 0` You could replace it later or use a condition to subset rows with at least one non-NA i.e. `i2 <- rowSums(is.na(dat[1:30, c(7, 10, 13)])) < 3` – akrun Mar 15 '22 at 17:07

0 Answers0