Questions tagged [cloudera-cdh]

For questions specifically about Cloudera's Distribution of Apache Hadoop (CDH). Please look at https://community.cloudera.com/ before posting questions.

From cloudera.com - CDH Components:

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything you need for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects, Cloudera has created a functionally advanced system that helps you perform end-to-end Big Data workflows.

Key Projects:

  • Apache Hadoop (Core)
  • Apache Accumulo
  • Apache Flume
  • Apache HBase
  • Apache Hive
  • Hue
  • Apache Impala (incubating)
  • Apache Kafka
  • Apache Pig
  • Apache Sentry
  • Cloudera Search
  • Apache Spark
  • Apache Sqoop

RESOURCES

  • CDH5 - archives - CDH5 packages and parcels
  • Documentation - official documentation
  • Blogs - engineering blogs with useful tutorials and in-depth explanations of Hadoop functionality
  • Community Forums - questions and answers from the CDH community

Related Tags

1018 questions
6
votes
4 answers

How to use Scala implicit class in Java

I have a Scala Implicit class from RecordService API, which i wanted to use in Java file. package object spark { implicit class RecordServiceContext(ctx: SparkContext) { def recordServiceTextFile(path: String) : RDD[String] = { new…
Shankar
  • 8,529
  • 26
  • 90
  • 159
6
votes
0 answers

Failed redirect for container for log

Recently we upgraded to YARN with CDH 5. (Version : 2.3.0 cdh5.1.3, r8e266e052e423af592871e2dfe09d54c03f6a0e8) I was trying to access logs of failed job from Resource Manager by clicking logs on ApplicationMaster but I got following…
roy
  • 6,344
  • 24
  • 92
  • 174
6
votes
2 answers

Hive always run mapred jobs in local mode

We are testing a multi node hadoop cluster (2.4.0) with Hive (0.13.0). The cluster works fine, but when we runa a query in hive, the mapred job are always executed locally. For example: Without hive-site.xml (in fact, without any configuration file…
user2591846
  • 61
  • 1
  • 3
5
votes
2 answers

Is there any alternative for Cloudera Manager? (CDH)

As Cloudera official blog said, there is no free version of CDH from 6.3.3, they would make the Cloudera Manager to open source, but not yet. Is there any other project like Cloudera Manager? which can manage Hadoop components by Web UI, especially…
York Huang
  • 51
  • 2
5
votes
0 answers

CDH 6.2 Hive cannot execute queries neither on Spark nor MapReduce

I'm trying to run a simple select count(*) from table query on Hive, but it fails with the following error: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark…
5
votes
1 answer

Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect

I am trying to resolve a spark-submit classpath runtime issue for an Apache Tika (>v 1.14) parsing job. The problem seems to involve spark-submit classpath vs my uber-jar. Platforms: CDH 5.15 (Spark 2.3 added via CDH docs) and CDH 6 (Spark 2.2…
5
votes
1 answer

spark kinesis failing on cloudera with java.lang.AbstractMethodError

below is my POM file. I am writing a spark streaming with aws kinesis org.apache.spark spark-streaming_2.10 1.6.0
5
votes
3 answers

Scala + SBT - How to configure reference.conf for a shaded Akka library

TL;DR I am trying to shade a version of the akka library and bundle it with my application (to be able to run a spray-can server on the CDH 5.7 version of Spark 1.6). The shading process messes up akka's default configuration, and after manually…
Johan Hirsch
  • 557
  • 4
  • 21
5
votes
1 answer

Access spark-shell from different Spark versions

TL;DR: Is it absolutely necessary that the Spark running a spark-shell (driver) have the exactly same version of the Spark's master? I am using Spark 1.5.0 to connect to Spark 1.5.0-cdh5.5.0 via spark-shell: spark-shell --master…
5
votes
3 answers

Weird behaviour with spark-submit

I am running the following code in pyspark: In [14]: conf = SparkConf() In [15]: conf.getAll() [(u'spark.eventLog.enabled', u'true'), (u'spark.eventLog.dir', u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), …
nanounanue
  • 7,942
  • 7
  • 41
  • 73
5
votes
3 answers

Flag -useHCatalog not working

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connect pig with the metastore: ➜ ~ pig…
nanounanue
  • 7,942
  • 7
  • 41
  • 73
5
votes
1 answer

Hadoop installation directory

Which directory is Hadoop installed in Cloudera distribution? Is it in /usr/bin/hadoop? [cloudera@quickstart opt]$ which hadoop /usr/bin/hadoop I know the software packages are to be installed inside /opt/ directory. What does Apache recommend?
Rio mario
  • 283
  • 6
  • 18
5
votes
1 answer

Connection refused in Hbase Shell while Connecting HBase to HDFS

I am trying to connect my HBase to HDFS. I have my hdfs namenode(bin/hdfs namenode) and datnode(/bin/hdfs datanode) running. I can also start my Hbase (sudo ./bin/start-hbase.sh) and local region servers (sudo ./bin/local-regionservers.sh start 1…
anon
  • 367
  • 1
  • 4
  • 18
5
votes
2 answers

Compare data in two RDD in spark

I am able to print data in two RDD with the below code. usersRDD.foreach(println) empRDD.foreach(println) I need to compare data in two RDDs. How can I iterate and compare field data in one RDD with field data in another RDD. Eg: iterate the…
Ramakrishna
  • 1,170
  • 2
  • 10
  • 17
5
votes
2 answers

stop cloudera CDH5 cluster command line

I would like to know command line for stopping and starting cloudera CDH5.2 cluster. Reason, I am writing an automation script for running some benchmark tests and want to stop and start cluster before starting with each benchmark test. I have seen…
rational
  • 143
  • 10
1 2
3
67 68