MLFlow in docker - unable to store artifacts in sftp server (atmoz)

Question

I would like to run MLflow "entirely offline" using docker (i.e. no cloud storage like S3 or blob). So I followed this guide and tried to set the artifact store to the atmoz sftp server running inside another docker container. As suggested in the MLFlow docs, I try to auth with host keys, however, when I try to register my artifact I receive the following error pysftp.exceptions.CredentialException: No password or key specified.

I guess, there's something wrong with my hostkey setup. I also tried to follow this guide (mentioned in this question), but unfortunately it didn't have enough details for my - limited - knowledge of containers, sftp servers and pub-priv-key setups. My docker-compose looks like this...

services:
db:
    restart: always
    image: mysql/mysql-server:5.7.28
    container_name: mlflow_db
    expose:
        - "3306"
    networks:
        - backend
    environment:
        - MYSQL_DATABASE=${MYSQL_DATABASE}
        - MYSQL_USER=${MYSQL_USER}
        - MYSQL_PASSWORD=${MYSQL_PASSWORD}
        - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
    volumes:
        - dbdata:/var/lib/mysql

mlflow-sftp:
    image: atmoz/sftp
    container_name: mlflow-sftp
    ports:
        - "2222:22"
    volumes:
        - ./storage/sftp:/home/foo/storage
        - ./ssh_host_ed25519_key:/home/foo/.ssh/ssh_host_ed25519_key.pub:ro
        - ./ssh_host_rsa_key:/home/foo/.ssh/ssh_host_rsa_key.pub:ro
    command: foo::1001
    networks:
        - backend
    
web:
    restart: always
    build: ./mlflow
    depends_on:
        - mlflow-sftp
    image: mlflow_server
    container_name: mlflow_server
    expose:
        - "5000"
    networks:
        - frontend
        - backend
    volumes:
        - ./ssh_host_ed25519_key:/root/.ssh/ssh_host_ed25519_key:ro
        - ./ssh_host_rsa_key:/root/.ssh/ssh_host_rsa_key:ro
    command: >
        bash -c "sleep 3
        && ssh-keyscan -H mlflow-sftp >> ~/.ssh/known_hosts
        && mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root sftp://foo@localhost:2222/storage --host 0.0.0.0"
    
nginx:
    restart: always
    build: ./nginx
    image: mlflow_nginx
    container_name: mlflow_nginx
    ports:
        - "80:80"
    networks:
        - frontend
    depends_on:
        - web

networks: frontend: driver: bridge backend: driver: bridge

volumes: dbdata:

... and in my python script I create a new mlflow experiment as follows.

remote_server_uri = "http://localhost:80" 
mlflow.set_tracking_uri(remote_server_uri)
EXPERIMENT_NAME = "test43"
mlflow.create_experiment(EXPERIMENT_NAME) #, artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
EXPERIMENT_NAME = "test43"
mlflow.create_experiment(EXPERIMENT_NAME) #, artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
with mlflow.start_run():
    print(mlflow.get_artifact_uri())
    print(mlflow.get_registry_uri())
    lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
    lr.fit(train_x, train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

    print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1_ratio)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)

    tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

    if tracking_url_type_store != "file":
        mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
    else:
        mlflow.sklearn.log_model(lr, "model")

I haven't modified the dockerfiles of the first mentioned guide i.e. you'll be able to see them here. My guess is that I messed something up with the host keys, maybe put them in a wrong directory, but after several hours of brute-force experimenting I hope someone can help me with a pointer in the right direction. Let me know if there's anything missing to reproduce the error.

MLFlow in docker - unable to store artifacts in sftp server (atmoz)

0 Answers0