0

I am trying to use MatchIt to perform Propensity Score Matching (PSM) for my panel data which contains following a group of participants (participant_uuid) from 12 months before treatment to 12 months after treatment, i.e. we have complete 24 observations per participant. I am performing the matching to prepare a dataset for later calculating Diff-in-diff models. As with diff-in-diff I want to find how the reaction to treatment might vary between groups I am matching based on the 12 months prior treatment. My code currently looks like this:

match.nearestneighbour <- matchit(grouping_variable ~ characteristic1 + characterictic2 + charcteristic3, data = dataset_12months_pre_treatment, distance = "glm", method = "nearest", m.order = "largest", replace = TRUE, exact = c("month_relative_to_treatment"))

I realized this code matches on individual unit level (i.e. selects the best participant_uuid for each month_relative_to_treatment matchig to each of the observations treatment group months). How to change the rstudio code to find the nearest neighbor not per month/observation but the participant_uuid in control group with nearest distance to participant_uuid in treatment group, aggregated across the 12 months considered? Any hints are much appreciated.

1 Answers1

0

You need to transform your dataset so that it is wide, i.e., so there is a single for each observation, and each column contains the value of its variables for the given month. Then you include all the month-specific variables in the matching formula to estimate the propensity score. This will attempt to create pairs of units that are similar across all 12 months.

Noah
  • 3,437
  • 1
  • 11
  • 27
  • Hi Noah, many thanks for your reply. Transforming to wide results in two challenges: 1) Including the variables I wanted to include across all 24 months I am considering seems to overstrain my model (no result can be achieved) 2) I need to weight variables to account for the fact that time-independent variables are given sufficient consideration What would you recommend? Are you aware of prior papers and/or methodological discussions of PSM for Panel data? Thanks in advance! – Alexanderg Apr 13 '23 at 20:43
  • Maybe use a model that penalizes the coefficients like ridge regression or avoid a model and use the scaled Euclidean or Mahalanobis distance. I think you only need to match in the pre-period, so don't seek matches using all 24 months (if I understand your analysis correctly). I don't know of any specific papers but I do think there are some out there. You might look into generalized synthetic control. – Noah Apr 13 '23 at 21:07