Getting error in apache spark simple program

Question

i am doing one simple example of word count in apache spark in java with reference of Internet and i m getting error of Caused by: java.net.UnknownHostException: my.txt
you can see my below code for the reference!

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class MyCount {

public static void main(String[] args) {
    // TODO Auto-generated method stub

    String file = "hdfs://my.txt";
    JavaSparkContext sc = new JavaSparkContext("local", "Simple App");
    JavaRDD<String> lines = sc.textFile(file);
    long nums = lines.count();
    System.out.println(nums);
    }
}

score 2 · Answer 1 · answered Jan 31 '16 at 14:18

2

Can you try String file = "hdfs://localhost/my.txt"

PS: make sure you have this file my.txt in hdfs. In case if you don't have that file hdfs, follow below command to put the file in hdfs from local dir.

Hadoop fs -copyFromLocal /home/training/my.txt hadoop/

answered Jan 31 '16 at 14:18

sandip44

71
5

Thanks brother for your help ! but i found the solution.. we dont have to specify the file extension in the path that's it! – Bhaumik Thakkar Jan 31 '16 at 15:15
Anytime mate, mark it as answer, if it could helped u – sandip44 Jan 31 '16 at 15:29

score 1 · Accepted Answer · edited May 23 '17 at 11:45

Old question but an answer was never accepted, the mistake at the time I read it is mixing the "local" concept of Spark with "localhost."

Using this constructor: JavaSparkContext(java.lang.String master, java.lang.String appName), you would want to use:

JavaSparkContext sc = new JavaSparkContext("localhost", "Simple App");

but the question was using "local". Further, the HDFS filename didn't specify a hostname: "hdfs://SomeNameNode:9000/foo/bar/"or

"hdfs://host:port/absolute-path"

As of 1.6.2, the Javadoc for JavaSparkContext is not showing any constructor that let's you specify the cluster type directly:

http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html

The best constructor for JavaSparkContext wants a SparkConf object. To do something more readable by humans, build a SparkConf object and then pass it to JavaSparkContext, here's an example that sets the appname, specifies Kryo serializer and sets the master:

    SparkConf sparkConf = new SparkConf().setAppName("Threshold")
            //.setMaster("local[4]");
            .setMaster(getMasterString(masterName))
            .set("spark.serializer",   "org.apache.spark.serializer.KryoSerializer")
            .registerKryoClasses(kryoClassArray);

    // create the JavaSparkContext now:
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);

NOTE: the alternate .setMaster("local[4]"); would use local mode, which the OP may have been trying.

I have a more extended answer here that addresses using hostnames vs. IP addresses and a lot more for setting up your SparkConf

score 0 · Answer 3 · edited Jun 23 '16 at 14:20

You can try this simple word count program

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
public class First {
    public static void main(String[] args) {

        SparkConf sf = new SparkConf().setMaster("local[3]").setAppName("parth");
        JavaSparkContext sc = new JavaSparkContext(sf);
        JavaRDD<String> textFile = sc.textFile("input file path");
        JavaRDD<String> words = textFile.flatMap((new FlatMapFunction<String, String>() {
            public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }}));
            JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
            public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer a, Integer b) { return a + b; }
});
counts.saveAsTextFile("outputfile-path");
    }
}

Getting error in apache spark simple program

3 Answers3