-1

Objective

Given data ds, compute a new variable ds$h1 from ds$raw1 and ds$raw2 according to the harmonization rule specified in the object hrule.

The reproducible example contains response of 10 individuals on 2 measures, raw1 and raw2:

>ds
   id raw1 raw2
1   1    1    1
2   2    1    0
3   3    0    1
4   4    0    0
5   5   NA    1
6   6   NA    0
7   7    1   NA
8   8    0   NA
9   9   NA   NA
10 10    1    1

These two variables need to be transformed into a single, harmonized variable, according to some rule (developed qualitatively). The rules of harmonizational transformation are encoded in the object hrule:

>hrule
  raw1 raw2  h1
1    0    0   0
2    0    1   1
3    0   NA   0
4    1    0   1
5    1    1   1
6    1   NA   1
7   NA    0   0
8   NA    1   1
9   NA   NA   NA

Thus, the rule should be read for row 1 as:

if respondent provides a value of 0 on raw1 and the value of 0 on raw2 then the value of h1 should be 0.

Functional objective

Develop a function that passes ds, hrule, names of variables a character vector( c("raw1","raw2")) , and the name of the harmonization variable ("h1") and outputs a new harmonized variable (ds$h1).

Starter code

(ds <- data.frame("id" = 1:10,
                  "raw1" = c(1,1,0,0,NA,NA,1 ,0 ,NA,1),
                  "raw2" = c(1,0,1,0,1 ,0 ,NA,NA,NA,1)))
(response_profile <- ds %>% dplyr::group_by(raw1, raw2) %>% dplyr::summarize(count=n()))
(hrule <- cbind(response_profile, "h1" = c(0,1,0,1,1,1,0,1,NA)))
new_function <- function(ds, hrule,
                         variable_names, # variable_names = c("raw1,"raw2"), the number will vary
                         harmony_name # harmony_name = "h1", there might be "h2"
){

}

Thanks in advance for your ideas!

andrey
  • 2,029
  • 2
  • 18
  • 23

1 Answers1

0

Here's the full solution, suggested by @Symbolix

rm(list=ls(all=TRUE)) #Clear the memory of variables from previous run. This is not called by knitr, because it's above the first chunk.
cat("\f")
library(magrittr)

(ds <- data.frame("id" = 1:10,
                  "raw1" = c(1,1,0,0,NA,NA,1 ,0 ,NA,1),
                  "raw2" = c(1,0,1,0,1 ,0 ,NA,NA,NA,1)))
response_profile <- ds %>% dplyr::group_by(raw1, raw2) %>% dplyr::summarize(count=n()) %>% dplyr::select(-count)
(hrule <- cbind(response_profile, 
                "h1" = c(0,1,0 ,1,1,1 ,0 ,1 ,NA), # at least one 1 to produce 1
                "h2"=  c(0,0,NA,0,1,NA,NA,NA,NA) # both must be 1
                )) 
recode_from_meta <- function(ds, hrule, variable_names, harmony_name){
d <- merge(ds, hrule[, c(variable_names, harmony_name)], by=variable_names, all.x=T)
}

> hrule
  raw1 raw2 h1 h2
1    0    0  0  0
2    0    1  1  0
3    0   NA  0 NA
4    1    0  1  0
5    1    1  1  1
6    1   NA  1 NA
7   NA    0  0 NA
8   NA    1  1 NA
9   NA   NA NA NA


> (d <- recode_from_meta(ds, hrule,variable_names=c("raw1", "raw2"), harmony_name="h1"))
   raw1 raw2 id h1
1     0    0  4  0
2     0    1  3  1
3     0   NA  8  0
4     1    0  2  1
5     1    1  1  1
6     1    1 10  1
7     1   NA  7  1
8    NA    0  6  0
9    NA    1  5  1
10   NA   NA  9 NA
> (d <- recode_from_meta(ds, hrule,variable_names=c("raw1", "raw2"), harmony_name="h2"))
   raw1 raw2 id h2
1     0    0  4  0
2     0    1  3  0
3     0   NA  8 NA
4     1    0  2  0
5     1    1  1  1
6     1    1 10  1
7     1   NA  7 NA
8    NA    0  6 NA
9    NA    1  5 NA
10   NA   NA  9 NA
andrey
  • 2,029
  • 2
  • 18
  • 23