0

I am using TextClassificationEngine. Given example at https://docs.prediction.io/demo/textclassification/ there is two algorithms (Naive Bayes and Logistic Regression) for text classification. Naive Bayes algorithm works fine but when I used Logistic Regression it gives some error.

{
  "id": "default",
  "description": "Default settings",
  "engineFactory": "org.template.textclassification.TextClassificationEngine",
  "datasource": {
    "params": {
      "appName": "Logistic"
    }
  },
  "preparator": {
    "params": {
      "nGram": 2
    }
  },
  "algorithms": [
    {
      "name": "regParam",
      "params": {
        "regParam":2,5
      }
    }
  ]
} 

above code is given on site, it have some error at regParam due to array. Can anyone please explain me what regParam in engine.json and what is the final engine.json for Logistic regression?

Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Kishore
  • 5,761
  • 5
  • 28
  • 53
  • the error is because you have `"regParam":2,5` with a comma, not `"regParam":2.5` with a period. @EmergentOrder explains how it's used – pferrel Sep 30 '15 at 20:17

1 Answers1

1

This is a mistake in the doc, I'll submit a pull to fix it.

regParam is expecting a double, so something like "regParam":0.1

Specifically, this param is used (in Spark ML) for L2 regularization, which reduces the chance of overfitting [where the model does not generalize well] by penalizing models for being too complex.

The exact value to use is an empirical problem, I would suggest if you are concerned about the value to do some search over the space of values for regParam, using a held-out validation set (or k-fold) to determine the winner.