0

We are using spark ML for Logistics Regression. While executing the code in spark for 1GB of input data, when the code enters Logistics Regression, it creates high number of stages and at each stage takes approximately 2.8 GB of input which is causing total input to reach approximately 700 GB. Below is the sample code that is calling logistic Regression spark ml api:

var lRModel: LogisticRegressionModel = null

try {
    var logisticRegression = new LogisticRegression()

    //logisticRegression.setMaxIter(10)

     lRModel = logisticRegression.fit(logisticRegressionInputDF)
    } catch {
      case ex:Exception => {
        throw new ModellingUJTransformationException("Exception while fitting  logistic regression model on a LR input dataframe --"+ex.getMessage, ex)
      }
    }

Also, find attached the DAG at that stage:

enter image description here

enter image description here

Tatsuyuki Ishi
  • 3,883
  • 3
  • 29
  • 41
Neha Jain
  • 1
  • 1

0 Answers0