0

I have built a XGBoost model on my local machine which takes a training data and validates the model on a testing dataset. However, I have hard-coded the date values as the training data is created monthly. The training data gets created based on what Date Parameter I pass. Eg, jan = dt(2021,1,1).

I now have to automate the process as the model has to be deployed on AWS and should run monthly without editing the code. How should I pass the date parameter to AWS Wrangler so that the process will be automated, and the code will execute once every month on a new dataset.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470

1 Answers1

0

One approach would be to export the Data Wrangler Flow to a SageMaker Pipeline (this can be done via the Data Wrangler UI). Assuming your dataset is in S3, the exported Flow will generate a notebook that defines a SageMaker Pipeline which can take an S3 URI as input and run it through the Data Wrangler Steps. You can configure the SageMaker Pipeline to run on a schedule and pass the new S3 URI for each execution through SageMaker Pipeline's execution parameters.

An alternative approach is to use a Lambda function as described in this AWS blog - https://aws.amazon.com/blogs/machine-learning/schedule-an-amazon-sagemaker-data-wrangler-flow-to-process-new-data-periodically-using-aws-lambda-functions/.

I work at AWS and my opinions are my own.

Kirit Thadaka
  • 429
  • 2
  • 5