Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions

votes

2 answers

How to set JAVA_HOME Cloudera quickstart for Kafka and Zookeeper

I have added Kafka service to my Cloudera cluster and when i try to start it it fails with the following error Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/kafka/common/utils/KafkaThread : Unsupported major.minor…

hadoop apache-kafka cloudera cloudera-manager cloudera-quickstart-vm

asked Dec 23 '19 at 16:22

Justin Syrus

votes

0 answers

Spark Job fails after Cloudera upgrade to 5.16.1

I'have very simple example Spark job which counts 2+2 compiled with Spark 1.6. I'm performing spark Submit in the following way: spark-submit --master yarn --deploy-mode cluster --executor-memory 2G --driver-memory 1G --conf…

apache-spark hadoop hadoop-yarn cloudera

asked Dec 08 '19 at 15:33

danny.lesnik

18,479
29
135
200

votes

2 answers

How to access s3 files using on prem hadoop cluster?

I have a cloudera VM and able to set up aws CLI and set up keys.But, I am not able to read s3 files or access s3 files using hadoop fs -ls s3://gft-ri or any hadoop command. I could see the directory/files using aws CLI. Snapshot of the…

hadoop amazon-s3 cloudera

asked Nov 27 '19 at 13:33

user3858193

1,320
5
18
50

votes

1 answer

Failed to read environment Variables in Scala using $ symbol

Adding Property in Scala Environment Properties val sysProps = System.getProperties sysProps.setProperty("current.date.time", LocalDateTime.now().toString()) i'm able to save this property. I tried accessing this property(current.date.time) in…

scala apache-spark logging log4j cloudera

asked Nov 27 '19 at 12:09

Venkatesh

votes

1 answer

Cloudera node /etc/krb5.conf replaced at every reboot

I have a question, why are my cloudera nodes replacing the file /etc/krb5.conf ata every reboot ?? Im trying to make modifications, and when someone issues a reboot the file is again replaced by the old config file

hadoop kerberos cloudera mit-kerberos kdc

asked Nov 22 '19 at 16:23

Flechoide

votes

1 answer

Hue service error: Could not connect to quickstart.cloudera:21050

I have installed cloudera-quickstart-vm-5.13.0-0-virtualbox in virtual box. Configuration Details: CPU: 3 & Memory: 9000MB Now when I launch cloudera express from terminal using command sudo /home/cloudera/cloudera-manager --force --express Then…

cloudera impala cloudera-cdh hue cloudera-quickstart-vm

asked Nov 21 '19 at 11:33

Programmer

votes

2 answers

Sqoop fails with password-file argument

I have a sqoop script which ingests data from SAP HANA to Hive. The sqoop scripts runs fine when I give password as argument "--password Password$$", but to secure the password , I put it in a file called sap.password and used…

hadoop hive hdfs sqoop cloudera

asked Nov 14 '19 at 10:58

John Thomas

votes

1 answer

Impala not supporting Unicode characters

Select statement returning bad character on Impala. First image shows result by Hive and 2nd by Impala. It is managed table created in Hive, source table is external

sql hive cloudera impala

asked Nov 06 '19 at 14:41

Asad ch

votes

1 answer

Order is not preserved in PySpark collect_set only for string column

I am using the collect_set method on a DataFrame and adding 3 columns. My df is as below: id acc_no acc_name cust_id 1 111 ABC 88 1 222 XYZ 99 Below is the code snippet: from pyspark.sql import Window import…

apache-spark hadoop pyspark cloudera window-functions

asked Oct 28 '19 at 16:34

Suyog

votes

0 answers

Implements GeoMESA on CDH6

I have cluster of Cloudera with CDH 6.1. I need to implement solution for geospatial processing based on GeoMESA library. My solution should read geospatial data from both CSV, and GeoJSON files. After some research I found that GeoMESA must have…

bigdata geospatial cloudera cloudera-cdh geomesa

asked Oct 27 '19 at 16:32

Ofir Ofri

votes

1 answer

Running multiple sql queries and testing for pass or fail Spark Scala

I am running 100 queries (test cases) to check for data quality in Spark Scala. I am querying data from a hive table. An empty data frame is the expected result for these sample queries: SELECT car_type FROM car_data WHERE car_version is null SELECT…

sql apache-spark hadoop hiveql cloudera

asked Oct 24 '19 at 20:31

Defcon

votes

1 answer

Error creating database in Cloudera Impala (Virtual machine)

I have downloaded and started the cloudera virtual machine with impala. At the time of executing the database creation statement, an error related to the catalog and state-store service appeared. Perform the service update from console, however when…

centos cloudera impala hue cloudera-quickstart-vm

asked Oct 23 '19 at 20:43

Daniel Vera

votes

1 answer

How can spark write (create) a table in hive as external in HDP 3.1

The default spark-shell --conf spark.hadoop.metastore.catalog.default=hive val df:Dataframe = ... df.write.saveAsTable("db.table") fails as it tries to write a internal / managed / transactional table (see How to write a table to hive from spark…

apache-spark hive apache-spark-sql cloudera hdp

asked Oct 16 '19 at 15:19

Georg Heiler

16,916
36
162
292

votes

1 answer

Mapreduce Job Failing with "MAX_FAILED_UNIQUE_FETCHES; bailing-out"

The Map-reduce job is failing with the following error on the reducer Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#5 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at…

hadoop mapreduce cloudera hortonworks-data-platform hdp

asked Oct 10 '19 at 02:51

Sai Kiran Reddy Malikireddy

votes

2 answers

Apache nifi to append year, month and day timestamp to the merged output file

I am creating end to end flow to consume data into HDFS by using Consume Kafka for the Json files received through tealium event stream. Currently, I have used Consume Kafka -> Evaluate Json Path -> Jolttransform Json -> Merge Content -> Evaluate…

apache-nifi cloudera hortonworks-data-platform mapr apache-nifi-registry

asked Oct 07 '19 at 13:21

Deepak

Prev 1 2 3

…

99 100 Next