0

I am new in Spark and I am using spark-2.1.0-bin-hadoop2.7.

I have checked it's WordsCount sample and it works fine, but JavaLDAExample does not.

I checked their source codes here. WordsCount requires an url as parameter for it's data and I have created my data via hdfs on hadoop, so I send the path like this hdfs://master:9000/input/data/test.txt.

But JavaLDAExample uses a static path Dataset<Row> dataset = spark.read().format("libsvm").load("data/mllib/sample_lda_libsvm_data.txt"); and I don't know where is that address that I should move my files there.

I got this error ( Lines 51 , 59 ). Can you help me solve this ?

2 Answers2

0

From your logs I can see that the Spark is looking for the data in /home/unique/spark-2.1.0-bin-hadoop2.7/work/driver-20170130015037-0017/data/mllib/sample_lda_libsvm_data.txt.

I think your best bet is to modify that path and build new jar with modified path. Then you can point it to hdfs if you like.

If you don't want to do that and just want to quickly test things and play around then you should be able to run that code in your local Spark shell. When you download Spark and extract it the Spark directory should contain another directory called data. This example data can be found from there. So you can load it to Spark shell with relative path:

scala> spark.read.format("libsvm").load("data/mllib/sample_lda_libsvm_data.txt");

Menth
  • 388
  • 3
  • 5
  • Thanks so much. Our problem is that we are not familiar enough with Java IDEs and how to recompile the code.What is the command for compiling our java codes again ? thanks in advance. – amir golkar Jan 30 '17 at 12:35
0

This is a sample code provided with some sample datasets and the sample dataset for LDA is available locally at $SPARK_HOME/data/mllib/.

Here the sample data is static and provided with the package.

In case of yours, as you want to give your datasets you have three options:

  1. Write your own LDA application on similar lines and change code from line no 45: Dataset<Row> dataset = spark.read().format("libsvm") .load("data/mllib/sample_lda_libsvm_data.txt"); to Dataset<Row> dataset = spark.read().format("libsvm") .load("YOUR/DATA_SETS/LOCATION");
  2. Use the same code and change the path from line no 45: Dataset<Row> dataset = spark.read().format("libsvm") .load("data/mllib/sample_lda_libsvm_data.txt"); to Dataset<Row> dataset = spark.read().format("libsvm") .load("YOUR/DATA_SETS/LOCATION");

  3. You can put your file locally at $SPARK_HOME/data/mllib and then rename the file to sample_lda_libsvm_data.txt.

In first two cases, you will have to build/create new jar of your code and then use it and use new jar to execute your code.

Please let me know if you have any further issue.

Regards,

~Kedar Dixit

  • Thanks for your response. we have done the third one but it arises the same error!!! And for first Two options,We know how to change the code but we don't know how to recompile the our edited code. Can you elaborate us for it ? ( I am a c++ and web developer and i am not much familiar with Java compilers and IDEs , ... ) – amir golkar Jan 30 '17 at 12:21
  • or shortly ... What is the command for compiling our java codes again ? – amir golkar Jan 30 '17 at 12:36
  • depends, if you are using maven, then you will have to run "mvn clean install" to get the jar. Else if you are using eclipse all you need to do is right click on project-> export -> jar. You can see this http://stackoverflow.com/questions/24346053/how-to-create-jar-file for reference. Hope this helps – Kedar S. Dixit Feb 01 '17 at 10:05