Issue with Spark ML Logistic Regression creating high number of stages

Asked Apr 27 '17 at 11:11

Active May 01 '17 at 09:37

Viewed 99 times

We are using spark ML for Logistics Regression. While executing the code in spark for 1GB of input data, when the code enters Logistics Regression, it creates high number of stages and at each stage takes approximately 2.8 GB of input which is causing total input to reach approximately 700 GB. Below is the sample code that is calling logistic Regression spark ml api:

var lRModel: LogisticRegressionModel = null

try {
    var logisticRegression = new LogisticRegression()

    //logisticRegression.setMaxIter(10)

     lRModel = logisticRegression.fit(logisticRegressionInputDF)
    } catch {
      case ex:Exception => {
        throw new ModellingUJTransformationException("Exception while fitting  logistic regression model on a LR input dataframe --"+ex.getMessage, ex)
      }
    }

Also, find attached the DAG at that stage:

enter image description here

edited May 01 '17 at 09:37

Tatsuyuki Ishi

3,883
3
29
41

asked Apr 27 '17 at 11:11

Neha Jain

Please share your code, it would be easier to help you. Also the question should be more precise – T. Gawęda Apr 27 '17 at 12:23
I have updated the details. Please let me know in case you need more details. – Neha Jain May 01 '17 at 03:41

Issue with Spark ML Logistic Regression creating high number of stages

0 Answers0