5

I am trying to adapt the following ML.NET F# Product Recommender example to my own use case: https://github.com/dotnet/machinelearning-samples/tree/master/samples/fsharp/getting-started/MatrixFactorization_ProductRecommendation

However, in my dataset, I don't have two numeric ids. Instead, I have a UserId (numeric) and a ProductId (string). Because Key values seem to only be able to be numeric, I've tried mapping it using the MapValueToKey function. However, I'm still getting the following error:

Unhandled Exception: System.InvalidOperationException: Column 'UserId' with role MatrixColumnIndex should be a known cardinality U4 key, but is instead 'UInt32'
   at Microsoft.ML.Recommender.RecommenderUtils.CheckRowColumnType(RoleMappedData data, ColumnRole role, Column& col, Boolean isDecode)
   at Microsoft.ML.Recommender.RecommenderUtils.CheckAndGetMatrixIndexColumns(RoleMappedData data, Column& matrixColumnIndexColumn, Column& matrixRowIndexColumn, Boolean isDecode)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.TrainCore(IChannel ch, RoleMappedData data, RoleMappedData validData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView trainData, IDataView validationData)
   at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Fit(IDataView input)
   at <StartupCode$Recommender>.$Program.main@() in /Users/nat/Projects/Recommender/Recommender/Program.fs:line 75

The schema of my data is similar to the following:

UserId,ProductId
1,test-product-id

Here's the code that's failing, adapted from the linked example:

open Microsoft.ML
open Microsoft.ML.Data
open System
open Microsoft.ML.Trainers

[<CLIMutable>]
type ProductEntry = 
    {
        [<LoadColumn(0); KeyType(count=6248UL)>]
        UserId : uint32
        [<LoadColumn(1)>]
        ProductId : string
    }

[<CLIMutable>]
type Prediction = {Score : float32}

let trainDataPath = "/path/to/user_product_prediction.csv"

let mlContext = MLContext()

let pipeline = 
    mlContext.Transforms.Conversion.MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded")

let traindata = mlContext.Data.LoadFromTextFile<ProductEntry>(trainDataPath, hasHeader=true, separatorChar=',')

let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserId", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "ProductId",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)

let model = est.Fit(mappedDataView)

let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "test-product-id"; UserId = 13854u}

printfn ""
printfn "For ProductID = 'test-product-id' and  ProductId = 13854 the predicted score is %f" prediction.Score
printf "=============== End of process, hit any key to finish ==============="
Console.ReadKey() |> ignore

The other link I've been using as guidance is https://medium.com/machinelearningadvantage/build-a-product-recommender-using-c-and-ml-net-machine-learning-ab890b802d25

I have been trying to get this to work for hours. What the heck am I doing wrong?


Update

I've managed to get a little further, by making my program more similar to the official .NET sample. What I've got now is:

open Microsoft.ML
open Microsoft.ML.Data
open System
open Microsoft.ML.Trainers

[<CLIMutable>]
type ProductEntry = 
    {
        [<LoadColumn(0); KeyType(count=6248UL)>]
        UserId : uint32
        [<LoadColumn(1)>]
        ProductId : string
        [<NoColumn>]
        Label : float32
    }

[<CLIMutable>]
type Prediction = {Score : float32}

let trainDataPath = "/Users/nat/Downloads/user_product_prediction.csv"

let mlContext = MLContext()

let pipeline = 
    EstimatorChain().Append(
        mlContext.Transforms.Conversion
            .MapValueToKey(inputColumnName="UserId",outputColumnName="UserIdEncoded"))
        .Append(
            mlContext.Transforms.Conversion
                .MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded"))


let traindata =
    let columns = 
        [|
            TextLoader.Column("Label", DataKind.Single, 0)
            TextLoader.Column("UserId", DataKind.UInt32, source = [|TextLoader.Range(0)|], keyCount = KeyCount 6248UL) 
            TextLoader.Column("ProductId", DataKind.String, source = [|TextLoader.Range(1)|]) 
        |]
    mlContext.Data.LoadFromTextFile(trainDataPath, columns, hasHeader=true, separatorChar=',')

let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserIdEncoded", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "Label",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)

let model = est.Fit(mappedDataView)

let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "farfetch-13470673"; UserId = (uint32 13854); Label = 0.f}

printfn ""
printfn "For ProductID = 'farfetch-13470673' and  ProductId = 13854 the predicted score is %f" prediction.Score
printf "=============== End of process, hit any key to finish ==============="
Console.ReadKey() |> ignore

Where it fails now is at this line: let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)

with the error

Unhandled Exception: System.ArgumentOutOfRangeException: UserIdEncoded column 'MatrixColumnIndex' not found
Parameter name: schema
   at Microsoft.ML.Data.RoleMappedSchema.MapFromNames(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.RoleMappedSchema..ctor(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.GenericScorer.Bindings.Create(IHostEnvironment env, ISchemaBindableMapper bindable, DataViewSchema input, IEnumerable`1 roles, String suffix, Boolean user)
   at Microsoft.ML.Data.GenericScorer.Bindings.ApplyToSchema(IHostEnvironment env, DataViewSchema input)
   at Microsoft.ML.Data.GenericScorer..ctor(IHostEnvironment env, GenericScorer transform, IDataView data)
   at Microsoft.ML.Data.GenericScorer.ApplyToDataCore(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.RowToRowScorerBase.ApplyToData(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.PredictionTransformerBase`1.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.PredictionEngineBase`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngine`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngineExtensions.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, IHostEnvironment env, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.ModelOperationsCatalog.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
Nathaniel Elkins
  • 719
  • 1
  • 9
  • 16
  • Oh try changing the UserID property to be float32 – Jon Aug 25 '19 at 22:24
  • @Jon Unfortunately that didn't seem to work. Also, if you look at the most up to date sample (published 19 days ago as of the time of this writing), they are using unt32: https://github.com/dotnet/machinelearning-samples/blob/master/samples/fsharp/getting-started/MatrixFactorization_ProductRecommendation/ProductRecommender/Program.fs – Nathaniel Elkins Aug 26 '19 at 03:29
  • This annotation- and reflection-based approach is fundamentally unsound. It bypasses using the type system for the API, leading to runtime errors instead of compile-time errors as you have found. The alternative is IDataView but someone needs to be the first to try it. https://github.com/dotnet/machinelearning/issues/1991#issuecomment-461507560 – Charles Roddie Aug 26 '19 at 19:10

1 Answers1

2

I believe you are past the original hurdle: you trained the model successfully, and now you need to assemble all the trained assets into the prediction engine.

Note that there's TWO transformers that you have 'trained': the pre-processing pipeline (the result of call to pipeline.Fit(traindata)) and the recommender itself (the result of call to est.Fit(mappedDataView).

However, the prediction engine that you're creating is only taking the second transformer, so it will only work if we give it the output of the first transformer.

A better approach is to form one estimator with both the pre-processing and the recommender (I apologize for possible mistakes, F# is not my native language):

let pipeline = 
    EstimatorChain().Append(
        mlContext.Transforms.Conversion
            .MapValueToKey(inputColumnName="UserId",outputColumnName="UserIdEncoded"))
        .Append(
            mlContext.Transforms.Conversion
                .MapValueToKey(inputColumnName="ProductId",outputColumnName="ProductIdEncoded"))


let traindata =
    let columns = 
        [|
            TextLoader.Column("Label", DataKind.Single, 0)
            TextLoader.Column("UserId", DataKind.UInt32, source = [|TextLoader.Range(0)|], keyCount = KeyCount 6248UL) 
            TextLoader.Column("ProductId", DataKind.String, source = [|TextLoader.Range(1)|]) 
        |]
    mlContext.Data.LoadFromTextFile(trainDataPath, columns, hasHeader=true, separatorChar=',')

// No need to do it: 
// let mappedDataView = pipeline.Fit(traindata).Transform(traindata)

let options = MatrixFactorizationTrainer.Options(MatrixColumnIndexColumnName = "UserIdEncoded", 
                                                 MatrixRowIndexColumnName = "ProductIdEncoded",
                                                 LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass,
                                                 LabelColumnName = "Label",
                                                 Alpha = 0.01,
                                                 Lambda = 0.025)

// Rather than this:
// let est = mlContext.Recommendation().Trainers.MatrixFactorization(options)
// Do this:
let est = pipeline.Append( mlContext.Recommendation().Trainers.MatrixFactorization(options));

// Now train the whole pipeline.
let model = est.Fit(traindata)

// The rest should now work.
let predictionengine = mlContext.Model.CreatePredictionEngine<ProductEntry, Prediction>(model)
let prediction = predictionengine.Predict {ProductId = "farfetch-13470673"; UserId = (uint32 13854); Label = 0.f}

Zruty
  • 8,377
  • 1
  • 25
  • 31