1

i am doing one simple example of word count in apache spark in java with reference of Internet and i m getting error of Caused by: java.net.UnknownHostException: my.txt
you can see my below code for the reference!

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class MyCount {

public static void main(String[] args) {
    // TODO Auto-generated method stub

    String file = "hdfs://my.txt";
    JavaSparkContext sc = new JavaSparkContext("local", "Simple App");
    JavaRDD<String> lines = sc.textFile(file);
    long nums = lines.count();
    System.out.println(nums);
    }
}
Bhaumik Thakkar
  • 580
  • 1
  • 9
  • 28

3 Answers3

2

Can you try String file = "hdfs://localhost/my.txt"

PS: make sure you have this file my.txt in hdfs. In case if you don't have that file hdfs, follow below command to put the file in hdfs from local dir.

Hadoop fs -copyFromLocal /home/training/my.txt hadoop/

sandip44
  • 71
  • 5
1

Old question but an answer was never accepted, the mistake at the time I read it is mixing the "local" concept of Spark with "localhost."

Using this constructor: JavaSparkContext(java.lang.String master, java.lang.String appName), you would want to use:

JavaSparkContext sc = new JavaSparkContext("localhost", "Simple App");

but the question was using "local". Further, the HDFS filename didn't specify a hostname: "hdfs://SomeNameNode:9000/foo/bar/"or

"hdfs://host:port/absolute-path"

As of 1.6.2, the Javadoc for JavaSparkContext is not showing any constructor that let's you specify the cluster type directly:

http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html

The best constructor for JavaSparkContext wants a SparkConf object. To do something more readable by humans, build a SparkConf object and then pass it to JavaSparkContext, here's an example that sets the appname, specifies Kryo serializer and sets the master:

    SparkConf sparkConf = new SparkConf().setAppName("Threshold")
            //.setMaster("local[4]");
            .setMaster(getMasterString(masterName))
            .set("spark.serializer",   "org.apache.spark.serializer.KryoSerializer")
            .registerKryoClasses(kryoClassArray);

    // create the JavaSparkContext now:
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);

NOTE: the alternate .setMaster("local[4]"); would use local mode, which the OP may have been trying.

I have a more extended answer here that addresses using hostnames vs. IP addresses and a lot more for setting up your SparkConf

Community
  • 1
  • 1
JimLohse
  • 1,209
  • 4
  • 19
  • 44
0

You can try this simple word count program

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
public class First {
    public static void main(String[] args) {

        SparkConf sf = new SparkConf().setMaster("local[3]").setAppName("parth");
        JavaSparkContext sc = new JavaSparkContext(sf);
        JavaRDD<String> textFile = sc.textFile("input file path");
        JavaRDD<String> words = textFile.flatMap((new FlatMapFunction<String, String>() {
            public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }}));
            JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
            public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
});
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer a, Integer b) { return a + b; }
});
counts.saveAsTextFile("outputfile-path");
    }
}
mnille
  • 1,328
  • 4
  • 16
  • 20