Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions
0
votes
2 answers

How to set JAVA_HOME Cloudera quickstart for Kafka and Zookeeper

I have added Kafka service to my Cloudera cluster and when i try to start it it fails with the following error Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/kafka/common/utils/KafkaThread : Unsupported major.minor…
0
votes
0 answers

Spark Job fails after Cloudera upgrade to 5.16.1

I'have very simple example Spark job which counts 2+2 compiled with Spark 1.6. I'm performing spark Submit in the following way: spark-submit --master yarn --deploy-mode cluster --executor-memory 2G --driver-memory 1G --conf…
danny.lesnik
  • 18,479
  • 29
  • 135
  • 200
0
votes
2 answers

How to access s3 files using on prem hadoop cluster?

I have a cloudera VM and able to set up aws CLI and set up keys.But, I am not able to read s3 files or access s3 files using hadoop fs -ls s3://gft-ri or any hadoop command. I could see the directory/files using aws CLI. Snapshot of the…
user3858193
  • 1,320
  • 5
  • 18
  • 50
0
votes
1 answer

Failed to read environment Variables in Scala using $ symbol

Adding Property in Scala Environment Properties val sysProps = System.getProperties sysProps.setProperty("current.date.time", LocalDateTime.now().toString()) i'm able to save this property. I tried accessing this property(current.date.time) in…
Venkatesh
  • 1
  • 2
0
votes
1 answer

Cloudera node /etc/krb5.conf replaced at every reboot

I have a question, why are my cloudera nodes replacing the file /etc/krb5.conf ata every reboot ?? Im trying to make modifications, and when someone issues a reboot the file is again replaced by the old config file
Flechoide
  • 75
  • 3
  • 10
0
votes
1 answer

Hue service error: Could not connect to quickstart.cloudera:21050

I have installed cloudera-quickstart-vm-5.13.0-0-virtualbox in virtual box. Configuration Details: CPU: 3 & Memory: 9000MB Now when I launch cloudera express from terminal using command sudo /home/cloudera/cloudera-manager --force --express Then…
Programmer
  • 398
  • 1
  • 9
  • 33
0
votes
2 answers

Sqoop fails with password-file argument

I have a sqoop script which ingests data from SAP HANA to Hive. The sqoop scripts runs fine when I give password as argument "--password Password$$", but to secure the password , I put it in a file called sap.password and used…
John Thomas
  • 212
  • 3
  • 21
0
votes
1 answer

Impala not supporting Unicode characters

Select statement returning bad character on Impala. First image shows result by Hive and 2nd by Impala. It is managed table created in Hive, source table is external
Asad ch
  • 47
  • 1
  • 7
0
votes
1 answer

Order is not preserved in PySpark collect_set only for string column

I am using the collect_set method on a DataFrame and adding 3 columns. My df is as below: id acc_no acc_name cust_id 1 111 ABC 88 1 222 XYZ 99 Below is the code snippet: from pyspark.sql import Window import…
Suyog
  • 21
  • 4
0
votes
0 answers

Implements GeoMESA on CDH6

I have cluster of Cloudera with CDH 6.1. I need to implement solution for geospatial processing based on GeoMESA library. My solution should read geospatial data from both CSV, and GeoJSON files. After some research I found that GeoMESA must have…
0
votes
1 answer

Running multiple sql queries and testing for pass or fail Spark Scala

I am running 100 queries (test cases) to check for data quality in Spark Scala. I am querying data from a hive table. An empty data frame is the expected result for these sample queries: SELECT car_type FROM car_data WHERE car_version is null SELECT…
Defcon
  • 807
  • 3
  • 15
  • 36
0
votes
1 answer

Error creating database in Cloudera Impala (Virtual machine)

I have downloaded and started the cloudera virtual machine with impala. At the time of executing the database creation statement, an error related to the catalog and state-store service appeared. Perform the service update from console, however when…
Daniel Vera
  • 77
  • 1
  • 10
0
votes
1 answer

How can spark write (create) a table in hive as external in HDP 3.1

The default spark-shell --conf spark.hadoop.metastore.catalog.default=hive val df:Dataframe = ... df.write.saveAsTable("db.table") fails as it tries to write a internal / managed / transactional table (see How to write a table to hive from spark…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

Mapreduce Job Failing with "MAX_FAILED_UNIQUE_FETCHES; bailing-out"

The Map-reduce job is failing with the following error on the reducer Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#5 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at…
0
votes
2 answers

Apache nifi to append year, month and day timestamp to the merged output file

I am creating end to end flow to consume data into HDFS by using Consume Kafka for the Json files received through tealium event stream. Currently, I have used Consume Kafka -> Evaluate Json Path -> Jolttransform Json -> Merge Content -> Evaluate…