Questions tagged [cloudera-cdh]

For questions specifically about Cloudera's Distribution of Apache Hadoop (CDH). Please look at https://community.cloudera.com/ before posting questions.

From cloudera.com - CDH Components:

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything you need for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects, Cloudera has created a functionally advanced system that helps you perform end-to-end Big Data workflows.

Key Projects:

Apache Hadoop (Core)

Apache Accumulo

Apache Flume

Apache HBase

Apache Hive

Hue

Apache Impala (incubating)

Apache Kafka

Apache Pig

Apache Sentry

Cloudera Search

Apache Spark

Apache Sqoop

RESOURCES

CDH5 - archives - CDH5 packages and parcels
Documentation - official documentation
Blogs - engineering blogs with useful tutorials and in-depth explanations of Hadoop functionality
Community Forums - questions and answers from the CDH community

Related Tags

1018 questions

votes

4 answers

How to use Scala implicit class in Java

I have a Scala Implicit class from RecordService API, which i wanted to use in Java file. package object spark { implicit class RecordServiceContext(ctx: SparkContext) { def recordServiceTextFile(path: String) : RDD[String] = { new…

java scala cloudera-cdh

asked Apr 08 '16 at 10:58

Shankar

8,529
26
90
159

votes

0 answers

Failed redirect for container for log

Recently we upgraded to YARN with CDH 5. (Version : 2.3.0 cdh5.1.3, r8e266e052e423af592871e2dfe09d54c03f6a0e8) I was trying to access logs of failed job from Resource Manager by clicking logs on ApplicationMaster but I got following…

hadoop hadoop-yarn hadoop2 cloudera-cdh

asked Oct 06 '14 at 19:06

roy

6,344
24
92
174

votes

2 answers

Hive always run mapred jobs in local mode

We are testing a multi node hadoop cluster (2.4.0) with Hive (0.13.0). The cluster works fine, but when we runa a query in hive, the mapred job are always executed locally. For example: Without hive-site.xml (in fact, without any configuration file…

hive cloudera cloudera-cdh

asked Apr 29 '14 at 16:02

user2591846

votes

2 answers

Is there any alternative for Cloudera Manager? (CDH)

As Cloudera official blog said, there is no free version of CDH from 6.3.3, they would make the Cloudera Manager to open source, but not yet. Is there any other project like Cloudera Manager? which can manage Hadoop components by Web UI, especially…

hadoop cloudera-cdh cloudera-manager

asked May 08 '20 at 03:08

York Huang

votes

0 answers

CDH 6.2 Hive cannot execute queries neither on Spark nor MapReduce

I'm trying to run a simple select count(*) from table query on Hive, but it fails with the following error: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark…

apache-spark hive mapreduce cloudera-cdh

asked Apr 29 '19 at 16:40

Enrico Gallinucci

votes

1 answer

Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect

I am trying to resolve a spark-submit classpath runtime issue for an Apache Tika (>v 1.14) parsing job. The problem seems to involve spark-submit classpath vs my uber-jar. Platforms: CDH 5.15 (Spark 2.3 added via CDH docs) and CDH 6 (Spark 2.2…

apache-spark apache-tika cloudera-cdh

asked Sep 25 '18 at 19:51

WouldRatherBeSwimming

votes

1 answer

spark kinesis failing on cloudera with java.lang.AbstractMethodError

below is my POM file. I am writing a spark streaming with aws kinesis org.apache.spark spark-streaming_2.10 1.6.0 …

apache-spark spark-streaming cloudera-cdh amazon-kinesis amazon-kinesis-kpl

asked Apr 27 '17 at 05:53

Karn_way

1,005
3
19
42

votes

3 answers

Scala + SBT - How to configure reference.conf for a shaded Akka library

TL;DR I am trying to shade a version of the akka library and bundle it with my application (to be able to run a spray-can server on the CDH 5.7 version of Spark 1.6). The shading process messes up akka's default configuration, and after manually…

apache-spark akka cloudera-cdh sbt-assembly shading

asked Nov 21 '16 at 12:22

Johan Hirsch

votes

1 answer

Access spark-shell from different Spark versions

TL;DR: Is it absolutely necessary that the Spark running a spark-shell (driver) have the exactly same version of the Spark's master? I am using Spark 1.5.0 to connect to Spark 1.5.0-cdh5.5.0 via spark-shell: spark-shell --master…

apache-spark apache-spark-sql cloudera-cdh apache-spark-standalone

asked May 10 '16 at 16:19

matheusr

votes

3 answers

Weird behaviour with spark-submit

I am running the following code in pyspark: In [14]: conf = SparkConf() In [15]: conf.getAll() [(u'spark.eventLog.enabled', u'true'), (u'spark.eventLog.dir', u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), …

apache-spark hive cloudera-cdh apache-spark-sql

asked Jul 01 '15 at 21:13

nanounanue

7,942
7
41
73

votes

3 answers

Flag -useHCatalog not working

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connect pig with the metastore: ➜ ~ pig…

hadoop apache-pig cloudera-cdh

asked May 01 '15 at 15:48

nanounanue

7,942
7
41
73

votes

1 answer

Hadoop installation directory

Which directory is Hadoop installed in Cloudera distribution? Is it in /usr/bin/hadoop? [cloudera@quickstart opt]$ which hadoop /usr/bin/hadoop I know the software packages are to be installed inside /opt/ directory. What does Apache recommend?

java linux hadoop cloudera cloudera-cdh

asked Apr 07 '15 at 21:47

Rio mario

votes

1 answer

Connection refused in Hbase Shell while Connecting HBase to HDFS

I am trying to connect my HBase to HDFS. I have my hdfs namenode(bin/hdfs namenode) and datnode(/bin/hdfs datanode) running. I can also start my Hbase (sudo ./bin/start-hbase.sh) and local region servers (sudo ./bin/local-regionservers.sh start 1…

hadoop hbase hdfs hadoop2 cloudera-cdh

asked Jan 19 '15 at 18:48

anon

votes

2 answers

Compare data in two RDD in spark

I am able to print data in two RDD with the below code. usersRDD.foreach(println) empRDD.foreach(println) I need to compare data in two RDDs. How can I iterate and compare field data in one RDD with field data in another RDD. Eg: iterate the…

apache-spark scala-2.10 cloudera-cdh rdd

asked Jan 05 '15 at 15:45

Ramakrishna

1,170
2
10
17

votes

2 answers

stop cloudera CDH5 cluster command line

I would like to know command line for stopping and starting cloudera CDH5.2 cluster. Reason, I am writing an automation script for running some benchmark tests and want to stop and start cluster before starting with each benchmark test. I have seen…

cloudera cloudera-cdh haddock

asked Nov 14 '14 at 12:33

rational

Prev 1 2

…

67 68 Next