1

I'm having a problem building an ML.Net pipeline. I've read through ALOT of Microsoft documentation, but I think the problem is I just don't understand it. Was wondering if I could get some help from this community?

What I'm trying to do is to predict when a train will be called. I have gathered alot of data. I've put this data into a CSV file. The first column is when the train is predicted to be called. The second column is when the train was actually called. The data is in Unix Timestamp format. (I can put the data into C# DateTime format if that's easier)

Here is a sample of the data:

1682556540,1682571900
1682760480,1682786700
1683057540,1683056460
1683269880,1683274500
1683456840,1683445500
1683612960,1683814800
1684001940,1683975900
1684194420,1684203600

This is the code I have so far. All of this code I have copied from various code samples and tutorials I've been looking at. I've been going through the Microsoft documentation to TRY to understand each line. Like I said, the pipeline has me stumped right now.

using Microsoft.ML;
using Microsoft.ML.Data;

namespace TrainPrediction
{
    class TrainData
    {
        [LoadColumn(0)]
        public float PredictedTime;

        [LoadColumn(1)]
        public float ActualTime;
    }

    class Prediction
    {
        [ColumnName("Score")]
        public float PredictedTime;
    }

    class Program
    {
        static void Main(string[] args)
        {
            var mlContext = new MLContext();

            // Load the data
            var dataPath = @"d:\temp\aiengine-601.csv";
            var dataView = mlContext.Data.LoadFromTextFile<TrainData>(dataPath, separatorChar: ',');

            // Define the pipeline
            var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label")
                .Append(mlContext.Transforms.Concatenate("Features", "PredictedTime"))
                .Append(mlContext.Transforms.NormalizeMinMax("Features"))
                .Append(mlContext.Transforms.Conversion.MapKeyToValue("Label"))
                .Append(mlContext.Regression.Trainers.FastTree());

            // Train the model
            var model = pipeline.Fit(dataView);

            // Create a prediction engine
            var predictionEngine = mlContext.Model.CreatePredictionEngine<TrainData, Prediction>(model);

            // Prompt the user for a prediction time
            Console.Write("Enter a prediction time (Unix timestamp): ");
            if (float.TryParse(Console.ReadLine(), out float inputTime))
            {
                var inputData = new TrainData { PredictedTime = inputTime };
                var prediction = predictionEngine.Predict(inputData);

                // Convert the predicted time back to Unix timestamp
                var predictedTime = Math.Round(prediction.PredictedTime);

                Console.WriteLine($"ML.NET predicts the train will be called at: {predictedTime}");
            }
            else
            {
                Console.WriteLine("Invalid input!");
            }
        }
    }
}

When I run this code, I'm getting an error when I train the model (.Fit). It states "System.ArgumentOutOfRangeException: 'Could not find input column 'Label' {Parameter 'inputSchema')'

I believe I'm getting this error because my pipeline is not correct.

What I'm asking is if anyone could help me get the correct pipeline, and if you feel really frisky, explain the details of the pipeline.

I'm currently looking online for a "Dummies guide to pipelines" type of explanation.

jason835
  • 19
  • 3
  • The information in your input is just two times for each data point, so the real underlying information is solely the delta between those two times. You have a single-dimensional array of information, and you want to predict the future? – Steve May 18 '23 at 22:36
  • The first column is the prediction in Unix timestamp format. Let's say it's 05/01/2023 1325 in readable form. The second column is the actual call time in Unix timestamp format. Let's say it's 05/01/2023 1543. With tons of this data, if I enter a prediction time of 05/18/2023 1734, couldn't I get a ML.Net prediction of when that train should actually be called? – jason835 May 18 '23 at 23:35
  • So, you have two times. The important thing is the amount between these two times ... that's it. Are you expecting some kind of seasonal change or something? – Steve May 19 '23 at 17:07
  • With the exception of Amtrak, railroads are very unpredictable with the calling of their trains. I work for "Big Orange" class 1 railroad. When I clock out, shortly after, it pops up when I'm predicted to goto work. This is rarely correct. I've been collecting data when people are predicted to goto work and when they actually goto work. My thoughts were I could then use AI to more accurately predict when I'll goto work. Also, yes, there are seasonal changes. – jason835 May 21 '23 at 01:03

1 Answers1

0

The pipeline is expecting a field in your data to be named Label. The Label field should be the thing to be predicted (ground truth). In your case this would be ActualTime, which you want to predict from PredictedTime.

An easy fix would be to change the name "ActualTime" to "Label" in your TrainData class. If your .csv file has a header, change the name of the column there as well.

Nooby-Noob
  • 69
  • 5