Selecting specific rows and columns in r

Question

I'm hoping to get some advice from the community about functions that require a selection of rows and columns. I have a very messy database (real-world data from a central database) and I need to sum subscales for a total score. To make matters more complicated, I have some rows where the total has been provided but no raw data (so no individual data points for each question) and other rows where I have the individual data points and no total. For example:

Q1 Q2 Q3 Q4 Q5 TOTAL
2   3  0  1 NA   3   (Where individual data points and totals are provided (sum of Q2,Q3,Q5)
NA NA  NA NA NA  9   (No raw data points, only total scores provided)
1  2   4  2   1  NA  (Raw data points provided, but no total score`

If I tell r to ignore the NAs then it recognises the NA as 0 and provides a total score. However, that means it replaces the total of the 2nd row above to 0 as all the individual data points are NA. I've tried various codes such as apply, rowSum, cbind but I can't seem to find a solution. I basically want to run the following code, or equivalent, but tell r to ignore certain rows. I've been using the following:

rowSums(dat[, c(7, 10, 13)], na.rm=TRUE) (where 7,10, 13 are the column numbers) but if I try and add row numbers (rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)) it tells me 'the replacement has 30 rows, data has 1651.' I've also tried rowSums(dat[c(1:30,7, 10, 13)], na.rm=TRUE but I get an error 'undefined columns selected.'

Is there a way of telling r what rows to include and ignore when you have column conditions? I want a database that sums the individual sub-scores and ignores the rows where they are not provided. I’m very new to r, so a response along the lines of ‘r for dummies’ would be appreciated. Thank you

the replacement error occurs when you are assigning. i.e. you need to assign on the lhs with the same row index i.e. `data$TOTAL[1:30] <- rowSums(dat[1:30, c(7, 10, 13)], na.rm=TRUE)` — akrun, Mar 15 '22 at 16:31
Hi, thank you for your response. Unfortunately I'm getting the same output where the NA row totals are being replace by 0 — Laura, Mar 15 '22 at 17:05
that is because you have `na.rm = TRUE`, which returns 0 when all elements are NA i.e. `sum(c(NA, NA), na.rm = TRUE) [1] 0` You could replace it later or use a condition to subset rows with at least one non-NA i.e. `i2 <- rowSums(is.na(dat[1:30, c(7, 10, 13)])) < 3` — akrun, Mar 15 '22 at 17:07

Selecting specific rows and columns in r

0 Answers0