I would like to run MLflow "entirely offline" using docker (i.e. no cloud storage like S3 or blob). So I followed this guide and tried to set the artifact store to the atmoz sftp server running inside another docker container. As suggested in the MLFlow docs, I try to auth with host keys, however, when I try to register my artifact I receive the following error pysftp.exceptions.CredentialException: No password or key specified.
I guess, there's something wrong with my hostkey setup. I also tried to follow this guide (mentioned in this question), but unfortunately it didn't have enough details for my - limited - knowledge of containers, sftp servers and pub-priv-key setups. My docker-compose looks like this...
services:
db:
restart: always
image: mysql/mysql-server:5.7.28
container_name: mlflow_db
expose:
- "3306"
networks:
- backend
environment:
- MYSQL_DATABASE=${MYSQL_DATABASE}
- MYSQL_USER=${MYSQL_USER}
- MYSQL_PASSWORD=${MYSQL_PASSWORD}
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
volumes:
- dbdata:/var/lib/mysql
mlflow-sftp:
image: atmoz/sftp
container_name: mlflow-sftp
ports:
- "2222:22"
volumes:
- ./storage/sftp:/home/foo/storage
- ./ssh_host_ed25519_key:/home/foo/.ssh/ssh_host_ed25519_key.pub:ro
- ./ssh_host_rsa_key:/home/foo/.ssh/ssh_host_rsa_key.pub:ro
command: foo::1001
networks:
- backend
web:
restart: always
build: ./mlflow
depends_on:
- mlflow-sftp
image: mlflow_server
container_name: mlflow_server
expose:
- "5000"
networks:
- frontend
- backend
volumes:
- ./ssh_host_ed25519_key:/root/.ssh/ssh_host_ed25519_key:ro
- ./ssh_host_rsa_key:/root/.ssh/ssh_host_rsa_key:ro
command: >
bash -c "sleep 3
&& ssh-keyscan -H mlflow-sftp >> ~/.ssh/known_hosts
&& mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root sftp://foo@localhost:2222/storage --host 0.0.0.0"
nginx:
restart: always
build: ./nginx
image: mlflow_nginx
container_name: mlflow_nginx
ports:
- "80:80"
networks:
- frontend
depends_on:
- web
networks: frontend: driver: bridge backend: driver: bridge
volumes: dbdata:
... and in my python script I create a new mlflow experiment as follows.
remote_server_uri = "http://localhost:80"
mlflow.set_tracking_uri(remote_server_uri)
EXPERIMENT_NAME = "test43"
mlflow.create_experiment(EXPERIMENT_NAME) #, artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
EXPERIMENT_NAME = "test43"
mlflow.create_experiment(EXPERIMENT_NAME) #, artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
with mlflow.start_run():
print(mlflow.get_artifact_uri())
print(mlflow.get_registry_uri())
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme
if tracking_url_type_store != "file":
mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
else:
mlflow.sklearn.log_model(lr, "model")
I haven't modified the dockerfiles of the first mentioned guide i.e. you'll be able to see them here. My guess is that I messed something up with the host keys, maybe put them in a wrong directory, but after several hours of brute-force experimenting I hope someone can help me with a pointer in the right direction. Let me know if there's anything missing to reproduce the error.