Accounting for covariates (site effects) with mlr3 machine learning (pipelines)

Question

I have gone through the entire mlr3 book but am unable to find a solution for how to address site effects in my data, as it comes from a multicenter study. I theoretically know about leave-site-out CV, but as the amount of participants coming from the different sites is very heterogeneous (e.g. 4 from one site, 60 from another) my supervisor wants me to account for site effects during modeling.

He suggested using the ComBat function (which i am not familiar with, so happy about any help!), but I was looking for an implementation within mlr3 (Ideally as a pipeoperator to prevent data leakage from test to training set? I would then adjust for site effects after pipeops like scaling and imputation)

I would be more than happy for any suggestions on how to deal with this! Thank you so much in advance :)

EDIT:

My goal is to predict treatment response (binary variable: responder vs. nonresponder; based on a linear change score (pre-post) in a continuous outcome variable representing psychopathology → participants showing a reduction >= 25% from pre to post in the primary outcome variable are classified as responders)).

My features are clinical/behavioral data at baseline (pre) of the therapy (e.g. symptom scores, demographic variables, level of functioning).

So all in all i want to use baseline variables to predict the response to treatment with data coming from an RCT. The data was collected at multiple sites, e.g. some participants attended the therapy at different sites leading to a clustered data structure. In a regression model I would account for this using for example a random intercept (or including site as a covariate in the analysis).

However I am unsure about how to deal with this in ML analysis, as I don't want "site" to be a "feature" along with the other baseline data but rather adjust the data for any clustering effects. (Same would apply for the experimental group in the RCT design).

I hope it is now a little more understandable! Thank you!

You can always implement your own PipeOp, a tutorial on how to do this can be found in the extending chapter of the mlr3book: https://mlr3book.mlr-org.com/extending.html#extending-pipeops — Sebastian, Feb 28 '23 at 16:06
Can you give us a little bit more background please? What are your features, what do you want to predict, how do sites affect this? — Lars Kotthoff, Feb 28 '23 at 20:16
Thanks for your answers! I edited the post and hope this helps :) I know about the implementation of PipeOps, however I don't have a single clue which Pipe I could even create for my problem... — Hanna, Mar 02 '23 at 06:33

Accounting for covariates (site effects) with mlr3 machine learning (pipelines)

0 Answers0