Standardized Difference Scores after Matching - Discrepancy between packages

Question

I'm using the MatchIt package in R to conduct coursened exact matching. When I complete the matching and check the balance using cobalt, I'm told that the Diff.Adj is 0.00 for my categorical variables, and -0.06 for the continuous variable.

However, when I then create a table in gtsummary, the standardized difference scores for those variables are 0.65 for the categorical variable and 0.30 for the continuous variable.

Can anyone explain the discrepancy between the two packages?

In gtsummary I'm piping the table to add_difference(everything() ~ "smd").

FYI, the `add_difference()` function is using the {smd} package to calculate the standardized difference. To better investigate your discrepancy, you'll want to compare the {smd} and {cobalt} pkgs. https://www.danieldsjoberg.com/gtsummary/reference/tests.html — Daniel D. Sjoberg, Mar 23 '22 at 14:50
Also, if you post a reproducible example, there is a much higher chance someone will assist you. If you've never created one before, take a couple of minutes to review this page https://reprex.tidyverse.org/ — Daniel D. Sjoberg, Mar 23 '22 at 15:24

score 1 · Accepted Answer · answered Mar 23 '22 at 19:52

There are many possible reasons for differences you observe. Given that you have not supplied a minimal reproducible example or any output, we can only speculate. I am the author of MatchIt and cobalt so I can explain the choices there (which are the same) and how I justify them.

For continuous variables, the SMD after matching is the difference in the means (weighted by the matching weights) divided by a scaling factor computed in the original sample. I have justified the choice to compute the standardization factor in the original sample here and elsewhere. The standardization factor depends on the chosen target population, but it can be changed by supplying an argument to s.d.denom. By default, when matching for the ATT is used (the default in MatchIt), the standardization factor is the standard deviation of the variable in the treated group (again, computed prior to matching). When matching for the ATE, the standardization factor is the square root of the average of the variances in the treatment groups. The defaults and allowable arguments are explained in help("col_w_smd").

For categorical variables, cobalt first splits them into dummy variables for each category and then treats the dummy variables as independent variables. By default cobalt::bal.tab() produces unstandardized mean differences (i.e., raw differences in proportion) for binary and categorical variables. If you want standardized mean differences, you need to set binary = "std". I explain in the documentation why I think standardized mean differences don't make sense for binary variables. cobalt uses a special formula for the variance of binary variables (smd does as well), so be sure to take that into consideration when trying to replicate cobalt's results manually.

I am not sure exactly what smd (which is the basis for calculations in gtsummary()) does, because its documentation is somewhat sparse and its code (which uses an R6 architecture) is hard for me to read (though, admittedly, cobalt's is too). It seems like smd computes the standardization factor in the matched sample when matching weights are supplied (or only the matched sample is supplied to it), and it always computes the standardization factor as the square root of the average of the variances in the treatment groups. For categorical variables, it compute a single standardized mean difference for the whole variable using the formula described in Yang & Dalton (2012) rather than splitting the variable into separate dummy variables. I explain here why I don't think this is a great idea.

Hopefully this sheds some light on these differences. I would encourage you to use cobalt rather than gtsummary() for producing balance tables because of the amount of research that went into choosing these settings. They represent what, in my opinion, are best practices. cobalt also gives you the flexibility to supply your own choices if you don't agree, but by making those choices yourself, you get to know exactly how each value is calculated. I have also worked hard to ensure cobalt is thoroughly documented to help users understand exactly what is going on. Everything I described about cobalt's functionality is explained in the documentation.

Thank you, Noah! Apologies for the lack of a reprex - I'm new to R. This explanation is very helpful. — John Ryan, Mar 24 '22 at 13:03

Standardized Difference Scores after Matching - Discrepancy between packages

1 Answers1