0

I am trying ML.NET for basic sentiment analysis as given in link https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/sentiment-analysis .
I have followed step by step and used same files given in the link for training. Also checked all comments and answers in similar problem by another user in this link:
ml.net sentiment analysis warning about format errors & bad values

But still getting following errors. Most of training data has been resulted in errors as shown below (Processed 860 rows with 818 bad values). Ideally this should not happen as the data and code both is provided from Microsoft official site (first link given above). Code and error is pasted below.

Is there any change in microsoft site data, which is not yet updated by them?

Not adding a normalizer.
Making per-feature arrays
Changing data from row-wise to column-wise
  Bad value at line 8 in column Label
  Bad value at line 112 in column Label
  Bad value at line 187 in column Label
  Bad value at line 9 in column Label
  Bad value at line 10 in column Label
  Bad value at line 11 in column Label
  Bad value at line 12 in column Label
  Bad value at line 188 in column Label
  Bad value at line 190 in column Label
  Bad value at line 113 in column Label
  Suppressing further bad value messages
Processed 1773 rows with 1731 bad values and 0 format errors
Processed 42 instances
Binning and forming Feature objects
Reserved memory for tree learner: 1188 bytes
Starting to train ...
Warning: 50 of the boosting iterations failed to grow a tree. This is commonly because the minimum documents in leaf hyperparameter was set too high for this dataset.
Not training a calibrator because it is not needed.
  Bad value at line 7 in column Label
  Bad value at line 186 in column Label
  Bad value at line 111 in column Label
  Bad value at line 8 in column Label
  Bad value at line 9 in column Label
  Bad value at line 10 in column Label
  Bad value at line 11 in column Label
  Bad value at line 12 in column Label
  Suppressing further bad value messages
  Bad value at line 112 in column Label
  Bad value at line 187 in column Label
Processed 860 rows with 818 bad values and 0 format errors

Below is the code:

public class SentimentData
    {
        [Column(ordinal: "0", name: "Label")]
        public float Sentiment;
        [Column(ordinal: "1")]
        public string SentimentText;
    }

    public class SentimentPrediction
    {
        [ColumnName("PredictedLabel")]
        public bool Sentiment;
    }




class Program
    {
        //https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/sentiment-analysis

        static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "wikipedia-detox-250-line-data.tsv");
        static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "wikipedia-detox-250-line-test.tsv");
        static readonly string _modelpath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");



        static async Task Main(string[] args)
        {
            //Microsoft.ML.Legacy.Transforms.SentimentAnalyzer sentimentAnalyzer = new SentimentAnalyzer();
            //sentimentAnalyzer.Data;

            Console.WriteLine("---------------Training ------------------------------------");
            var model = await Train();
            Evaluate(model);
            Console.WriteLine("---------------Training Over------------------------------------");
            Console.WriteLine("Type END to exit");
            string s = "";
            while (s.ToLower() != "end")
            {
                s = Console.ReadLine();
                Console.WriteLine("Sentiment: {0}",(Predict(model, s).Sentiment ? "Negative" : "Positive"));
            }
        }

        public static async Task<PredictionModel<SentimentData, SentimentPrediction>> Train()
        {
            var pipeline = new LearningPipeline();
            TextLoader textLoader = new TextLoader(_dataPath).CreateFrom<SentimentData>(useHeader: true, allowQuotedStrings: true, supportSparse: true, trimWhitespace: true);
            pipeline.Add(textLoader);
            pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
            //pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 50, NumTrees = 50, MinDocumentsInLeafs = 20 });
            pipeline.Add(new LogisticRegressionBinaryClassifier() { });
            PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();
            await model.WriteAsync(_modelpath);
            return model;
        }

        public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)
        {
            var testData = new TextLoader(_testDataPath).CreateFrom<SentimentData>();
            var evaluator = new BinaryClassificationEvaluator();
            BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
            Console.WriteLine();
            Console.WriteLine("PredictionModel quality metrics evaluation");
            Console.WriteLine("------------------------------------------");
            Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
            Console.WriteLine($"Auc: {metrics.Auc:P2}");
            Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
        }

        public static SentimentPrediction Predict(PredictionModel<SentimentData, SentimentPrediction> model, string sentence)
        {
            return model.Predict(new SentimentData { SentimentText = sentence });
}
Sand T
  • 168
  • 1
  • 14
  • Can you show the code you have? I think the issue may be in the class where you define the data. – Jon Oct 16 '18 at 16:30
  • Thanks for reply. I have added the code above – Sand T Oct 17 '18 at 05:17
  • I have posted the issue at github: [https://github.com/dotnet/machinelearning/issues/1354](https://github.com/dotnet/machinelearning/issues/1354) – Sand T Oct 24 '18 at 09:35
  • example worked for me... how did you download the data? you must have something wrong in your file – c-chavez Nov 20 '18 at 15:21

0 Answers0