0

I am trying to implement CI/CD for my ml model and I am using DVC for that. This is my yaml file

name: train-model
on:
  push:
    paths:
      - "data/**"
      - "src/**"
      - "params.yaml"
      - "dvc.*"
jobs:
  train-model:
    runs-on: ubuntu-latest
    environment: cloud
    permissions:
      contents: read
      id-token: write
    steps:
      - uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - uses: iterative/setup-cml@v1
      - uses: actions/setup-python@v2
        with:
          python-version: "3.9"
      - name: SetupGitUser
        run: cml ci
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
      - name: TrainModel
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          pip install -r requirements.txt
          dvc pull
          dvc repro
          dvc push

However I keep getting this error

ERROR: failed to pull data from the cloud - Checkout failed for following targets:
785
/home/runner/work/open-source-mlops-e2e/open-source-mlops-e2e/data/raw
786
/home/runner/work/open-source-mlops-e2e/open-source-mlops-e2e/data/processed
787
/home/runner/work/open-source-mlops-e2e/open-source-mlops-e2e/models/clf-model.joblib
788
Is your cache up to date?
789
<https://error.dvc.org/missing-files>
790
Error: Process completed with exit code 1.

When I do dvc pull -v It works fine --

2023-01-27 15:11:16,824 DEBUG: Preparing to transfer data from '/workspace/open-source-mlops-e2e/dvc' to '/workspace/open-source-mlops-e2e/.dvc/cache'
2023-01-27 15:11:16,825 DEBUG: Preparing to collect status from '/workspace/open-source-mlops-e2e/.dvc/cache'
2023-01-27 15:11:16,825 DEBUG: Collecting status from '/workspace/open-source-mlops-e2e/.dvc/cache'
2023-01-27 15:11:16,841 DEBUG: built tree 'object 6920135c1a76a56a030a224fb82afb28.dir'                                                                         
2023-01-27 15:11:16,893 DEBUG: built tree 'object 9f384869826bdf146e6ff572c85d0d1e.dir'                                                                         
Everything is up to date.                                                                                                                                       
2023-01-27 15:11:16,900 DEBUG: Analytics is enabled.
2023-01-27 15:11:16,958 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp03jb3nq0']'
2023-01-27 15:11:16,959 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp03jb3nq0']'

My data is stored in my own gitpod instance so I am not sure why it is not working

[core]
    remote = myremote
['remote "myremote"']
    url = /workspace/open-source-mlops-e2e/dvc

Can anyone suggest any pointers?

Jorge Orpinel Pérez
  • 6,361
  • 1
  • 21
  • 38
Gops
  • 1
  • 2
  • Is the text at the error URL useful? https://error.dvc.org/missing-files basically – Jorge Orpinel Pérez Feb 01 '23 at 16:45
  • re "When I do dvc pull -v It works fine" but you are running that manually from your local environment right? Not from the CI job. The "remote" is configured to be local to the env where DVC runs (so not very remote :). – Jorge Orpinel Pérez Feb 01 '23 at 16:46

1 Answers1

1

ERROR: failed to pull data from the cloud - Checkout failed for following targets

This means that the data cannot be found on the configured remote (url = /workspace/open-source-mlops-e2e/dvc). Are you sure that the data has been pushed to the gitpod instance at the given location?

Typically you need to first dvc add ... the data, possibly run dvc repro, then dvc push it to the configured remote (/workspace/...), then you will be able to pull it.

dtrifiro
  • 46
  • 1