Accessing Hive, HDFS from mapr in local Spark

Question

I have just installed the mapr 5.1 sandbox virtual machine running in virtualbox, in mode Bridge mode. What I am trying to do is accessing Hive and HDFS from a local Spark(same operation I did it with the HDP 2.4 sandbox) but without success.

I have installed a MapR Client on my machine (using the command hadoop fs -ls I can reach a hdfs url). I also have a java/scala project with a main application that I tried to run it but gives me the following error:

Failed on local exception: java.io.IOException: An existing connection was forcibly closed by the remote host; Host Details : local host is: "DESKTOP-J9DMAUG/192.168.1.133"; destination host is: "maprdemo":7222

Here are the details about the project:

pom.xml

<properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.tools.version>2.10</scala.tools.version>
    <scala.version>2.10.4</scala.version>
    <spark.version>1.4.1</spark.version>
</properties>
 <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.tools.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.tools.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.tools.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.7.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>1.2.1</version>
    </dependency>

The main class:

    object MainApp {

  def main(args: Array[String]) {
    val conf = new SparkConf()
      .setAppName("SampleSparkApp")
      .setMaster("local[*]")

    val sc = new SparkContext(conf)
    val rdd = sc.textFile("/user/mapr/aas/sample.txt")
    println(s"count is: ${rdd.count()}")
    rdd.foreach(println(_))

    val sqlContext = new HiveContext(sc)

    val df = sqlContext.sql("select * from default.agg_w_cause_f_cdr_datamart_fac")
    df.show(10)
    sc.stop()
  }
}

On the classpath, as resources I also have core-site.xml and hive-site.xml

core-site.xml

<configuration>
<property>
    <!--<name>fs.defaultFS</name>-->
    <name>fs.defaultFS</name>
    <value>hdfs://maprdemo:7222</value>
</property>
<property>
    <name>fs.hdfs.impl</name>
    <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>

hive-site.xml

<configuration>
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://maprdemo:9083</value>
</property>

If you need any other details please let me know.

Worth mentioning that submitting the same code, as jar, using the spark-submit command on the mapr machine runs ok.

does anybody know how to make the configuration in order to start spark locally and reading something from hdfs, on mapr ? — dumitru, Jul 14 '16 at 12:46
Did you follow the instructions from here http://doc.mapr.com/display/MapR/MapR+Sandbox+for+Hadoop ? — Jonathan Taws, Jul 19 '16 at 13:39
Yes, I've already followed that in order to setup the environment. — dumitru, Jul 19 '16 at 15:44
Is name maprdemo resolved on the host, where your code is running? Try to replace the hostname with "localhost" or with the output of "hostname" command. — Robert Navado, Jul 25 '16 at 20:11
Yes, maprdemo is resolved on the local machine. Also tried to put the IP, but it's the same. — dumitru, Jul 26 '16 at 13:34

Accessing Hive, HDFS from mapr in local Spark

0 Answers0