How to format the file path in an MLTable for Azure Machine Learning uploaded during a pipeline job?

Question

How is the path to a (.csv) file to be expressed in a MLTable file that is created in a local folder but then uploaded as part of a pipline job?

I'm following the Jupyter notebook automl-forecasting-task-energy-demand-advance from the azuerml-examples repo (article and notebook). This example has a MLTable file as below referencing a .csv file with a relative path. Then in the pipeline the MLTable is uploaded to be accessible to a remote compute (a few things are omitted for brevity)

my_training_data_input = Input(
    type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)

compute = AmlCompute(
        name=compute_name, size="STANDARD_D2_V2", min_instances=0, max_instances=4
    )

forecasting_job = automl.forecasting(
    compute=compute_name, # name of the compute target we created above
    # name="dpv2-forecasting-job-02",
    experiment_name=exp_name,
    training_data=my_training_data_input,
    # validation_data = my_validation_data_input,
    target_column_name="demand",
    primary_metric="NormalizedRootMeanSquaredError",
    n_cross_validations="auto",
    enable_model_explainability=True,
    tags={"my_custom_tag": "My custom value"},
)

returned_job = ml_client.jobs.create_or_update(
    forecasting_job
)

ml_client.jobs.stream(returned_job.name)

But running this gives the error

Error meassage: Encountered user error while fetching data from Dataset. Error: UserErrorException: Message: MLTable yaml schema is invalid: Error Code: Validation Validation Error Code: Invalid MLTable Validation Target: MLTableToDataflow Error Message: Failed to convert a MLTable to dataflow uri path is not a valid datastore uri path | session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8 InnerException None ErrorResponse { "error": { "code": "UserError", "message": "MLTable yaml schema is invalid: \nError Code: Validation\nValidation Error Code: Invalid MLTable\nValidation Target: MLTableToDataflow\nError Message: Failed to convert a MLTable to dataflow\nuri path is not a valid datastore uri path\n| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8" } }

paths:
  - file: ./nyc_energy_training_clean.csv
transformations:
  - read_delimited:
        delimiter: ','
        encoding: 'ascii'
  - convert_column_types:
      - columns: demand
        column_type: float
      - columns: precip
        column_type: float
      - columns: temp
        column_type: float

How am I supposed to run this? Thanks in advance!

Ram · Answer 1 · 2023-02-01T11:53:40.913

1

For Remote PATH you can use the below and here is the document for create data assets.

It's important to note that the path specified in the MLTable file must be a valid path in the cloud, not just a valid path on your local machine.

edited Feb 01 '23 at 11:53

answered Feb 01 '23 at 11:32

Ram

2,459
1
7
14

Thank you. But how is the workflow supposed to be? Working from a local Jupyter notebook, you define the MLTable locally _and thus the path to the data_; then this MLTable along with the data is uploaded when you run `ml_client.jobs.create_or_update(my-automl-job)`. But now this path to the data (in the MLTable file) should point to its location in a Datastore, which doesn't exist when you define the file? How am I supposed to do this? – MrFranzén Feb 04 '23 at 13:23

How to format the file path in an MLTable for Azure Machine Learning uploaded during a pipeline job?

1 Answers1