0

Could anyone tell what's the reason for error:

botocore.exceptions.ParamValidationError: Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).:(s3|s3-object-lambda):[a-z-0-9]:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"

I try to use mlflow with docker. .env file contains:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_S3_BUCKET=vla...rts
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
MLFLOW_TRACKING_URI=http://127.0.0.1:5000
POSTGRES_USER=...
POSTGRES_PASSWORD=...
POSTGRES_DB=test_db

Also tried to use:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_S3_BUCKET=vla...rts
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
MLFLOW_TRACKING_URI=http://localhost:5000
POSTGRES_USER=...
POSTGRES_PASSWORD=...
POSTGRES_DB=test_db

docker-compose contains:

... 
   mlflow:
        restart: always
        image: mlflow_server
        container_name: mlflow_server
        ports:
          - "5000:5000"
        networks:
          - postgres
          - s3
        environment:
          - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
          - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
          - MLFLOW_S3_ENDPOINT_URL=http://nginx:9000
        command: mlflow server --backend-store-uri postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db/${POSTGRES_DB} --default-artifact-root s3://${AWS_S3_BUCKET}/ --host 0.0.0.0
...

As I understood, I get an exception cause bucket name is empty (""). But in .env file I set bucket name as vla...rts

Jorge Orpinel Pérez
  • 6,361
  • 1
  • 21
  • 38
Vladimir
  • 21
  • 1
  • I've noticed, that problem problem arises in s3_artifact_repo.py. If we after (bucket, dest_path) = data.parse_s3_uri(self.artifact_uri) add bucket="arts", everything goes ok and dvc repro finishes successfully. But, unfortunately artifacts in mlflow experiments are empty – Vladimir May 22 '22 at 18:06
  • Have you set a remote name in DVC configuration? It should be in the .dvc/config file. See also https://dvc.org/doc/command-reference/remote `add` and `modify`. – Jorge Orpinel Pérez May 23 '22 at 20:34
  • Omg, I guess, you're right. I'm trying to teach how to build MLOps pipeline. So, firstly, I used one s3 bucket. But when I tried to use docker, I decided to use minio s3 bucket. I'll try to modify dvc config and tell you about the results=) – Vladimir May 24 '22 at 00:12
  • p.s. I'm not sure `AWS_S3_BUCKET` is an env var that the AWS CLI (thus botocore, thus DVC) looks at :) – Jorge Orpinel Pérez May 24 '22 at 06:12

2 Answers2

0

Probably you are missing the DVC remote URL in DVC config (.dvc/config file) (ref). In this case it should be something like s3://vla...rts.

This is usually set with dvc remote add, but it can be reset manually or with dvc remote modify (ref).

Jorge Orpinel Pérez
  • 6,361
  • 1
  • 21
  • 38
0

OMG, It was very strange issue. It wasn't a problem with dvc. Everything was configured perfect. So I get this exception enter image description here

And I've changed only 1 thing - the name of experiment in train stage: mlflow.set_experiment("xgboost") -> mlflow.set_experiment("xgboost_2") And that's all!

Vladimir
  • 21
  • 1