1

Hi i am new to spark mllib.I already have one r model.I am trying the same model with spark mllib.here is R model code.

R code.

delhi <- read.delim("UItrain.txt", na.strings = "")  
delhi$lnprice <- log(delhi$price)
heddel <- lm(lnprice ~ bedrooms+ bathrooms+ area)
deltest <- read.delim("UItest.txt", na.strings = "") 
predict (heddel, deltest)

I am trying the same R code in spark mllib with java.

SparkConf conf = new SparkConf().setAppName("Linear Regression Example");
JavaSparkContext sc = new JavaSparkContext(conf);
String path = "UItrain.txt";
JavaRDD<String> data = sc.textFile(path);
JavaRDD<LabeledPoint> parsedData = data.map(
  new Function<String, LabeledPoint>() {
    public LabeledPoint call(String line) {
      String[] parts = line.split("\t");
      String[] features = parts[1].split("\t");
      double[] v = new double[features.length];
      for (int i = 0; i < features.length - 1; i++)
        v[i] = Double.parseDouble(features[i]);
      return new LabeledPoint(Double.parseDouble(parts[0]), Vectors.dense(v));
    }
  }
  );
 parsedData.cache();

// Building the model
 String input = "UItrain.txt";
 int data2 = "UItest.txt";
int numIterations = 100;
final LinearRegressionModel model =
  LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), data2);

// Evaluate model on training examples and compute training error
JavaRDD<Tuple2<Double, Double>> valuesAndPreds = parsedData.map(
  new Function<LabeledPoint, Tuple2<Double, Double>>() {
    public Tuple2<Double, Double> call(LabeledPoint point) {
      double prediction = model.predict(point.features());
      return new Tuple2<Double, Double>(prediction, point.label());
    }
  }
);
double MSE = new JavaDoubleRDD(valuesAndPreds.map(
  new Function<Tuple2<Double, Double>, Object>() {
    public Object call(Tuple2<Double, Double> pair) {
      return Math.pow(pair._1() - pair._2(), 2.0);
    }
  }
).rdd()).mean();
System.out.println("training Mean Squared Error = " + MSE);

I am getting error while building the model.any help will be appreciated.

zero323
  • 322,348
  • 103
  • 959
  • 935
arun abimaniyu
  • 167
  • 2
  • 12

1 Answers1

0

I think your error is in data2 here:

final LinearRegressionModelmodel=LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), data2)

the regression is expecting number of iterations and instead is receiving a text,

 int data2 = "UItest.txt";

If this is not the error then edit and print the error.

Dr VComas
  • 735
  • 7
  • 22