I have to save my MLFlow artifacts (using Databricks Unified Analytics) to a S3 bucket, with service-side encrpytion using a KMS key.
My instances are into an AWS account A, my S3 bucket and my KMS key into an account B. I can't have my KMS Key into my account A.
I don't want to use DBFS to mount S3 buckets, for security reasons (buckets can contains sensitive data and I don't want to share this between users).
I have to assume an IAM role in order to access the bucket, as I did to access it through s3a (with spark.hadoop.fs.s3a.credentialsType
and spark.hadoop.fs.s3a.stsAssumeRole.arn
parameters).
When I create an experiment with s3 and try to log a model like this :
import mlflow
import mlflow.sklearn
id_exp = mlflow.create_experiment("/Users/first.last@company.org/Experiment",'s3://s3-bucket-name/')
with mlflow.start_run(experiment_id=id_exp):
clf_mlf = tree.DecisionTreeClassifier()
clf_mlf = clf_mlf.fit(X_train, y_train)
y_pred = clf_mlf.predict(X_test)
mlflow.sklearn.log_model(clf_mlf, "model", serialization_format='pickle')
I have this error :
S3UploadFailedError: Failed to upload /tmp/tmp2yl2olhi/model/conda.yaml to s3-bucket-name//05c17a33a33d46a5ad3cc811a9faf35a/artifacts/model/conda.yaml: An error occurred (KMS.NotFoundException) when calling the PutObject operation: Key 'arn:aws:kms:eu-central-1:account_a_id:key/key_id' does not exist
How can I told MLFlow to assume a role before accessing to S3 ?