I read this article (https://journal.r-project.org/archive/2021/RJ-2021-073/RJ-2021-073.pdf) about multiple imputation and propensity score matching - here is the code from this article:
# code from "MatchThem:: Matching and Weighting after Multiple Imputation", Pishgar et al, The R Journal Vol. XX/YY, AAAA 20ZZ:
library(MatchThem)
data('osteoarthritis')
summary(osteoarthritis)
library(mice)
imputed.datasets <- mice(osteoarthritis, m = 5)
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
datasets = imputed.datasets,
approach = 'within',
method = 'nearest',
caliper = 0.05,
ratio = 2)
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
datasets = imputed.datasets,
approach = 'across',
method = 'ps',
estimand = 'ATM')
library(cobalt)
bal.tab(matched.datasets, stats = c('m', 'ks'),
imp.fun = 'max')
bal.tab(weighted.datasets, stats = c('m', 'ks'),
imp.fun = 'max')
library(survey)
matched.models <- with(matched.datasets,
svyglm(KOA ~ OSP, family = quasibinomial()),
cluster = TRUE)
weighted.models <- with(weighted.datasets,
svyglm(KOA ~ OSP, family = quasibinomial()))
matched.results <- pool(matched.models)
summary(matched.results, conf.int = TRUE)
As far as I understand the author first uses multiple imputation with mice (m = 5) and continues with the matching procedure with MatchThem - in the end MatchThem gives back a "mimids-object" called "matched.datasets" which contains the 5 different dataset of multiple imputation.
There is the "complete" function which can extract one of the datasets, f.e.
newdataset <- complete(matched.datasets, 2) # extracts the second dataset.
So newdataset is a data frame without NAs (because imputed) and can be used for any further tests.
Now, I would like to use a dataset as a dataframe (like after using complete), but this dataset should be some kind of a "mean" of all datasets - because how could I decide, which of the 5 datasets I use for my further analyses? Is there a way of doing something like this:
meanofdatasets <- complete(matched.datasets, meanofall5datasets) # extracts a dataset which contains something like the mean values of all datasets
In my data, for which I want to use this method, I would like to use an imputed and matched dataset of my original about 500 rows to do further tests, f.e. cox regression, kaplan meier plots or competing risk analyses as well as simple descriptive statistics with plots about the matched population. But on which of the 5 datasets do I have to append my tests? For those tests I need a real data frame, don't I?
Thank you for any help!