I have a dataset df
of which the structure looks similar to the example below:
nr countrycode questionA questionB questionC WeightquestionA WeightquestionB WeightquestionC
1 NLD 2 1 4 0.6 0.2 0.2
2 NLD NA 4 NA 0.4 0.4 0.2
3 NLD 4 4 1 0.2 0.2 0.6
4 BLG 1 NA 1 0.1 0.5 0.4
5 BLG 5 3 5 0.2 0.2 0.6
The questions A, B and C relate to a similar topic and as a result I would like to create an average score for all questions, taking into account the importance of each question (WeightquestionA WeightquestionB WeightquestionC
).
Currently I have manually calculated the average score.
(questionA*WeightquestionA) + (questionB*WeightquestionB) + (questionC*WeightquestionC)
This would not be an insurmountable problem were it not for the NA's (for which: no they cannot be removed). As a result I would like to automate the process.
I am currently thinking of using sum(!is.na())
for counting the non-NA's in each question (A,B,C) for each row (1 through 5) and putting that value into a new column.
With data.table I however always have trouble getting the syntax right. I believe it should be something like:
df[, NonNA:=sum(!is.na(questionA + questionB + questionC))]
But this sums all NA's in the column, instead of for each row. How should I write the syntax to calculate per row?
I would like to refer to the columns separately by name, because they are not next to each other in the actual df.
Desired output:
nr countrycode qA qB qC WeightquestionA WeightquestionB WeightquestionC NonNA
1 NLD 2 1 4 0.6 0.2 0.2 3
2 NLD NA 4 NA 0.4 0.4 0.2 1
3 NLD 4 4 1 0.2 0.2 0.6 3
4 BLG 1 NA 1 0.1 0.5 0.4 2
5 BLG 5 3 5 0.2 0.2 0.6 3