-1

I'm using the H2O package for machine learning. The goal is to use my dataset to estimate new predicted ages for each participant in my dataset. My setup for the code was to get rid of all unnecessary columns in my dataset like sex, race, other demographics, etc and just leave age and some biomarkers I'd like to compare it with.

My first attempt went well because I made my dataset cross-sectional (limiting us to one entry per participant) before getting rid of columns, so I don't need to worry about grouping. However, I want to also test it out longitudinally too. In my original dataset (before making it cross-sectional), I have a column called "ID", and sometimes rows have the same ID if a participant in our study visited multiple times. I'd like to factor this in to machine learning but I'm wondering how to do so.

Below is my code for the cross-sectional analysis. I'm not sure if I need to adjust anything here or earlier when I'm changing to factors or using as.h2o

aml <- h2o.automl(y = "age", 
               training_frame = train_long2, 
               max_models = 300, 
               stopping_metric = "MSE", 
               nfolds = 5, 
               seed = 1, 
               include_algos = "GBM", 
               sort_metric = "MSE", 
               keep_cross_validation_predictions = TRUE)
anonymous
  • 1
  • 1

1 Answers1

0

Let's call your original frame train_log2.

If you want to pick out part of an H2OFrame which has certain characteristics like ID == 10, you can do the following:

train_log2_ID10 <- train_log2[train_log2$ID==10,]

Now, you can call and build your model with train_log2_ID10 as before just using dataset rows when the ID is 10. You can do any kind of logical combinations like train_log2$ID==10 || train_log2$ID==20 or whatever you like.

Wendy
  • 206
  • 2