I have just installed the mapr 5.1 sandbox virtual machine running in virtualbox, in mode Bridge mode. What I am trying to do is accessing Hive and HDFS from a local Spark(same operation I did it with the HDP 2.4 sandbox) but without success.
I have installed a MapR Client on my machine (using the command hadoop fs -ls I can reach a hdfs url). I also have a java/scala project with a main application that I tried to run it but gives me the following error:
Failed on local exception: java.io.IOException: An existing connection was forcibly closed by the remote host; Host Details : local host is: "DESKTOP-J9DMAUG/192.168.1.133"; destination host is: "maprdemo":7222
Here are the details about the project:
pom.xml
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.tools.version>2.10</scala.tools.version>
<scala.version>2.10.4</scala.version>
<spark.version>1.4.1</spark.version>
</properties>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.tools.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
The main class:
object MainApp {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("SampleSparkApp")
.setMaster("local[*]")
val sc = new SparkContext(conf)
val rdd = sc.textFile("/user/mapr/aas/sample.txt")
println(s"count is: ${rdd.count()}")
rdd.foreach(println(_))
val sqlContext = new HiveContext(sc)
val df = sqlContext.sql("select * from default.agg_w_cause_f_cdr_datamart_fac")
df.show(10)
sc.stop()
}
}
On the classpath, as resources I also have core-site.xml and hive-site.xml
core-site.xml
<configuration>
<property>
<!--<name>fs.defaultFS</name>-->
<name>fs.defaultFS</name>
<value>hdfs://maprdemo:7222</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
hive-site.xml
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://maprdemo:9083</value>
</property>
If you need any other details please let me know.
Worth mentioning that submitting the same code, as jar, using the spark-submit command on the mapr machine runs ok.