Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions
0
votes
1 answer

How to handle potential data loss when performing comparisons across data types in different groups

Background: Our group is going through a Cloudera upgrade to 6.1.1 and I have been tasked with determining how to handle the loss of the implicit data type conversion across data types. See link below for the relevant Release Note…
J Weezy
  • 3,507
  • 3
  • 32
  • 88
0
votes
0 answers

Apache Spark task gives null pointer exception

In apache spark rdd job,my task is not completing and giving null pointer exception. Lost task 22.3 in stage 8.0 (TID 19700, 10.64.109.70): java.lang.NullPointerException at…
tarun
  • 218
  • 2
  • 11
0
votes
2 answers

Hue Filebrowser Search is searching only in the first layer

I had installed Hue with the Cloudera Manager on AWS. I have uploaded some directories with few files in there. If I am on the /user/hdfs path, there are directories like project1, project2. If I am searching project, I get as result the projects.…
madik_atma
  • 787
  • 10
  • 28
0
votes
2 answers

Beeline splits data row on csv export

My csv output file has few rows that are split into two cells because data in Hive table has string entries with ; symbol which causes the split. PROPER…
marcin2x4
  • 1,321
  • 2
  • 18
  • 44
0
votes
1 answer

HDFS Space Quota - What if the Parent folder has less quota compared to its child folders

Reg HDFS Space Quota - Cloudera If I have a folder A in HDFS and inside A there is another folder B. Space Quota for A - 100 MB Space Quota for B - 200 MB If I try to copy a file of size 150MB to B, is to going to fail?
0
votes
1 answer

How do I fix error with using SAML in Hue application

I got a problem for using saml function in hue application. I did every thing what I need to do in following this : https://docs.gethue.com/latest/administrator/configuration/server/#saml env os : ubuntu hue : 4.5.0 step 1. install below git gcc…
YoungIn Lee
  • 45
  • 1
  • 8
0
votes
1 answer

Kafka topic is not creating

I was installing kafka in Quickstart Cloudera VM using following link but when i am running below command kafka-topics --zookeeper quickstart.cloudera:2181 --create --topic test --partitions 1 --replication-factor 1 I am getting following…
0
votes
0 answers

Multiple servers in spark-defaults.conf

I have a Cloudera (Cloudera Express 5.4.1 (#197 built by jenkins on 20150509-0041 git: 003e06d761f80834d39c3a42431a266f0aaee736)) with Spark installed on it. I want the spark-defaults.conf to have 2 servers defined. How do we carry it…
Divines
  • 11
  • 2
0
votes
0 answers

What are the differences between spark version [2.4.0] and [2.4.0.cloudera2]

I looked for the spark dependencies on mvnrepository.com, there are many versions of the spark core, in Central and Cloudera, Central has version 2.4.0 and Cloudera version 2.4.0.Clouder1 or 2.4.0.Clouder2 For example, in spark core dependencies,…
0
votes
0 answers

Synchronization HBase tables between two clusters in SPARK

I want to write a tool that synchronize HBase tables between two environments. The tool should read data from the second cluster and update the table based on the timestamp. I use hbase-client in version: 1.2.0-cdh5.12.1 and Spark version:…
0
votes
0 answers

repeated number in string impala regex

I need to filter rows with repeated numbers in some id field in a table. For that i use a regular expression \b(\d)\1+\b This is an example of the regex. https://regex101.com/r/rJ7hJ6/7 But in impala this solution doesn't work. I tried select…
Figa17
  • 781
  • 7
  • 20
0
votes
0 answers

Installing Impala

I've installed hadoop and hive on my ubuntu 18.1....but I found difficulty in installing Impala Is there a link to install imapala in ubuntu without cloudera manager. Couldn't able to install with official link since it appears to need large memory…
0
votes
1 answer

Oozie not starting with Mysql in AWS EC2 instance

We are installing an Hadoop cluster on AWS Ec2 instance(5 nodes) for POC purpose. Software Stack - Hadoop, HDFS, Oozie and MongoDB. We are able to successfully install Hadoop, HDFS and MongoDB. But we are not able to install Oozie with Mysql…
Shash
  • 452
  • 8
  • 25
0
votes
1 answer

The Cloudera QuickStart VM Sqoop Error in OJDBC driver

I installed Cloudera QuickStart VM 5.13. I'm using the Sqoop. I tried execute the next command: [cloudera@quickstart ~]$ sqoop list-tables --connect jdbc:oracle:thin:@localhost:1521:xe --username Guest1 --password G147 Then I have the…
HailToTheVictor
  • 388
  • 2
  • 8
  • 28
0
votes
1 answer

How to detect in Hadoop cluster if any Datanode drive (Storage) failed

I am trying to detect the drive failure in Datanode in a Hadoop Cluster. Cloudera Manager API don't have any specific API for that. CM API are only related to Name node or restart services. Are there any suggestions here? Thanks a lot!