1

A new version of MLFlow (1.23) provided a --serve-artifacts option (via this pull request) along with some example code. This should allow me to simplify the rollout of a server for data scientists by only needing to give them one URL for the tracking server, rather than a URI for the tracking server, URI for the artifacts server, and a username/password for the artifacts server. At least, that's how I understand it.

A complication that I have is that I need to use podman instead of docker for my containers (and without relying on podman-compose). I ask that you keep those requirements in mind; I'm aware that this is an odd situation.

What I did before this update (for MLFlow 1.22) was to create a kubernetes play yaml config, and I was successfully able to issue a podman play kube ... command to start a pod and from a different machine successfully run an experiment and save artifacts after setting the appropriate four env variables. I've been struggling with getting things working with the newest version.

I am following the docker-compose example provided here. I am trying a (hopefully) simpler approach. The following is my kubernetes play file defining a pod.

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-01-14T19:07:15Z"
  labels:
    app: mlflowpod
  name: mlflowpod
spec:
  containers:
  - name: minio
    image: quay.io/minio/minio:latest
    ports:
    - containerPort: 9001
      hostPort: 9001
    - containerPort: 9000
      hostPort: 9000
    resources: {}
    tty: true
    volumeMounts:
    - mountPath: /data
      name: minio-data
    args:
    - server
    - /data
    - --console-address
    - :9001

  - name: mlflow-tracking
    image: localhost/mlflow:latest
    ports:
    - containerPort: 80
      hostPort: 8090
    resources: {}
    tty: true
    env:
      - name: MLFLOW_S3_ENDPOINT_URL
        value: http://127.0.0.1:9000
      - name: AWS_ACCESS_KEY_ID
        value: minioadmin
      - name: AWS_SECRET_ACCESS_KEY
        value: minioadmin
    command: ["mlflow"]
    args:
      - server
      - -p 
      - 80
      - --host 
      - 0.0.0.0
      - --backend-store-uri 
      - sqlite:///root/store.db
      - --serve-artifacts
      - --artifacts-destination 
      - s3://mlflow
      - --default-artifact-root 
      - mlflow-artifacts:/
#      - http://127.0.0.1:80/api/2.0/mlflow-artifacts/artifacts/experiments
      - --gunicorn-opts 
      - "--log-level debug"
    volumeMounts:
    - mountPath: /root
      name: mlflow-data  

  volumes:
  - hostPath:
      path: ./minio
      type: Directory
    name: minio-data
  - hostPath:
      path: ./mlflow
      type: Directory
    name: mlflow-data
status: {}

I start this with podman play kube mlflowpod.yaml. On the same machine (or a different one, it doesn't matter), I have cloned and installed mlflow into a virtual environment. From that virtual environment, I set an environmental variable MLFLOW_TRACKING_URI to <name-of-server>:8090. I then run the example.py file in the mlflow_artifacts example directory. I get the following response:

....
botocore.exceptions.NoCredentialsError: Unable to locate credentials

Which seems like the client needs the server credentials to minIO, which I thought the proxy was supposed to take care of.

If I also provide the env variables

$env:MLFLOW_S3_ENDPOINT_URL="http://<name-of-server>:9000/" 
$env:AWS_ACCESS_KEY_ID="minioadmin"
$env:AWS_SECRET_ACCESS_KEY="minioadmin"

Then things work. But that kind of defeats the purpose of the proxy...

What is it about the proxy setup via kubernates play yaml and podman that is going wrong?

tkott
  • 450
  • 3
  • 10

1 Answers1

4

Just in case anyone stumbles upon this, I had same issue based on your description. However the problem on my side was that I was that I tried to test this with a preexisting experiment (default), and I did not create new one, so the old setting carried over, thus resulting in MLFlow trying to use s3 trough credentials and not https.

Hope this helps at least some of you out there.

Juraj
  • 53
  • 1
  • 7
  • 1
    Totally helpful. Please bear in mind that if you deploy without proxied artifact storage access, and then add it, the old experiments will still NOT use the proxy. I don't know how to reset this. – DavidS1992 Jul 13 '22 at 12:13
  • Does this mean we would have to: 1) Recreate a new model 2) Upload the experiments for this to work? – Calvin Raveenthran Jul 20 '22 at 19:35