How to add missing columns to a Azure ML Web Service Input when using a variable number of columns in my dataset

Question

I'm working with Azure ML for the first time, so please excuse any newbie mistakes!

My training pipeline takes a dataset generated by an ADF dataflow which uses the Pivot modifier to transform rows into columns (the source dataset is a list of projects and corresponding technologies).

e.g.

Project	Technology
project1	tech1
project1	tech2
project2	tech1
project2	tech3
project3	tech4

The data is transformed by the ADF dataflow to:

Project	tech1	tech2	tech3	tech4
project1	true	true	false	false
project2	true	false	true	false
project3	false	false	false	true

Extra columns are added and then the transformed data is sinked to ADLSGen2 from where it's ingested into Azure ML. I've then created an Training pipeline in Azure ML which runs a linear regression model on the data, scoring my label column.

Training pipeline

From here I was able to create a realtime Inference pipeline with a web service input and output.

Inference pipeline

I was able to deploy the endpoint and test it using the test tool within the Endpoint detail page. My issue is when I remove features from the input json (e.g. only passing tech1, tech2 as boolean) I hit the error: Input Data Error. Input data are inconsistent with schema

This makes sense, since the inference pipeline obviously expects features that match the training data. Since the UI calling the ML endpoint won't necessary know all the available features (read technologies), I need to find a way to add any missing columns dynamically. The list of technologies is long so they can't be added manually. I think the solution is to join to my source dataset, adding any missing columns (features) to the web service payload.

Tried this but it failed to deploy the endpoint with an error that the adf-sink datasource is unsupported

How do I go about fixing this? Thank you!

UPDATE: 4/18/24

I've since found a better way of tackling this is to join the rows into a single space delimited column which I then process using the "Extract N-gram features from Text" component.

My input dataset generated from ADF now looks like:

Project	Technology
project1	tech1 tech2
project2	tech1 tech3
project3	tech4

The next problem I hit was my inference pipeline is always returning an empty dataset but I have started a separate thread for that here.

How to add missing columns to a Azure ML Web Service Input when using a variable number of columns in my dataset

0 Answers0