3

I have to save my MLFlow artifacts (using Databricks Unified Analytics) to a S3 bucket, with service-side encrpytion using a KMS key.

My instances are into an AWS account A, my S3 bucket and my KMS key into an account B. I can't have my KMS Key into my account A.

I don't want to use DBFS to mount S3 buckets, for security reasons (buckets can contains sensitive data and I don't want to share this between users).

I have to assume an IAM role in order to access the bucket, as I did to access it through s3a (with spark.hadoop.fs.s3a.credentialsType and spark.hadoop.fs.s3a.stsAssumeRole.arn parameters).

When I create an experiment with s3 and try to log a model like this :

import mlflow
import mlflow.sklearn
id_exp = mlflow.create_experiment("/Users/first.last@company.org/Experiment",'s3://s3-bucket-name/')
with mlflow.start_run(experiment_id=id_exp):
  clf_mlf = tree.DecisionTreeClassifier()
  clf_mlf = clf_mlf.fit(X_train, y_train)
  y_pred = clf_mlf.predict(X_test)
  mlflow.sklearn.log_model(clf_mlf, "model", serialization_format='pickle')

I have this error :

S3UploadFailedError: Failed to upload /tmp/tmp2yl2olhi/model/conda.yaml to s3-bucket-name//05c17a33a33d46a5ad3cc811a9faf35a/artifacts/model/conda.yaml: An error occurred (KMS.NotFoundException) when calling the PutObject operation: Key 'arn:aws:kms:eu-central-1:account_a_id:key/key_id' does not exist

How can I told MLFlow to assume a role before accessing to S3 ?

Behel
  • 31
  • 1

0 Answers0