Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions
12
votes
3 answers

Livy Server on Amazon EMR hangs on Connecting to ResourceManager

I'm trying to deploy a Livy Server on Amazon EMR. First I built the Livy master branch mvn clean package -Pscala-2.11 -Pspark-2.0 Then, I uploaded it to the EMR cluster master. I set the following…
matheusr
  • 567
  • 9
  • 29
12
votes
7 answers

hdfs - ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:

I am trying to use the below to list my dirs in hdfs: ubuntu@ubuntu:~$ hadoop fs -ls hdfs://127.0.0.1:50075/ ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match…
Tampa
  • 75,446
  • 119
  • 278
  • 425
11
votes
1 answer

Cloudera 5.6: Parquet does not support date. See HIVE-6384

I am currently using Cloudera 5.6 trying to create a parquet format table in hive table based off another table, but I am running into an error. create table sfdc_opportunities_sandbox_parquet like sfdc_opportunities_sandbox STORED AS…
pitchblack408
  • 2,913
  • 4
  • 36
  • 54
11
votes
1 answer

Spark java.io.EOFException: Premature EOF: no length prefix available

I am trying to read parquet file and perform some operations on it and save the result as parquet on HDFS. I am doing it using Spark. While doing so I am getting following exception. java.io.EOFException: Premature EOF: no length prefix available at…
Aditya Calangutkar
  • 486
  • 1
  • 6
  • 21
11
votes
2 answers

Elaboration on why shuffle write data is way more then input data in apache spark

Can anyone elaborate to me what exactly Input, Output, Shuffle Read, and Shuffle Write specify in spark UI? Also, Can someone explain how is input in this job is 25~30% of shuffle write? As per my understanding, shuffle write is sum of temporary…
Abhishek Anand
  • 1,940
  • 14
  • 27
11
votes
2 answers

zookeeper client does not provide CLI with "jline support is disabled" message

I just brought up CDH 5.4 and installed zookeeper. I used zkCli successfully many times before. This time the command line launch stops before getting to the prompt with Welcome to ZooKeeper! JLine support is disabled 2015-05-04 18:18:33,936 [myid:]…
bhomass
  • 3,414
  • 8
  • 45
  • 75
11
votes
3 answers

error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol

I'm currently trying to test the implemented changes for achieving security with Encrypted Shuffle in Cloudera Hadoop Environment. I've created the certificates and keystores and kept them in appropriate locations. I'm testing TaskTracker's HTTPS…
Saurabh Gokhale
  • 53,625
  • 36
  • 139
  • 164
10
votes
3 answers

Unable to connect to Hive2 using Python

While connecting to Hive2 using Python with below code: import pyhs2 with pyhs2.connect(host='localhost', port=10000, authMechanism="PLAIN", user='root', password='test', database='default') as…
Vinod
  • 376
  • 2
  • 11
  • 34
10
votes
1 answer

TIMESTAMP format issue in HIVE

I have Hive table created from JSON file. CREATE external TABLE logan_test.t1 ( name string, start_time timestamp ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH SERDEPROPERTIES ( "timestamp.formats" =…
logan
  • 7,946
  • 36
  • 114
  • 185
10
votes
4 answers

Invalid URI for NameNode address

I'm trying to set up a Cloudera Hadoop cluster, with a master node containing the namenode, secondarynamenode and jobtracker, and two others nodes containing the datanode and tasktracker. The Cloudera version is 4.6, the OS is ubuntu precise x64.…
cybertextron
  • 10,547
  • 28
  • 104
  • 208
10
votes
8 answers

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr

I recently upgraded my cluster from Apache Hadoop1.0 to CDH4.4.0. I have a weblogic server in another machine from where i submit jobs to this remote cluster via mapreduce client. I still want to use MR1 and not Yarn. I have compiled my client…
RGC
  • 332
  • 1
  • 2
  • 12
10
votes
2 answers

Running wordcount sample using MRV1 on CDH4.0.1 VM

I downloaded the VM from https://downloads.cloudera.com/demo_vm/vmware/cloudera-demo-vm-cdh4.0.0-vmware.tar.gz I found that below listed services are running after the system boots. MRV1…
Ujjwal Wadhawan
  • 733
  • 1
  • 8
  • 10
10
votes
3 answers

How to run HBase shell against a remote cluster

I'm running HBase in pseudo-distributed mode on my workstation. We also have HBase running on a cluster. Using the HBase shell, I'd like to access the HBase instance that's running on the cluster from my workstation. I would like to do this…
sangfroid
  • 3,733
  • 11
  • 38
  • 42
9
votes
3 answers

Livy Server: return a dataframe as JSON?

I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body { "code": "spark.sql(\"select * from test_table limit 10\")" } I would like an answer in the following…
matheusr
  • 567
  • 9
  • 29
9
votes
3 answers

Apache Spark error : Could not connect to akka.tcp://sparkMaster@

This is our first steps using big data stuff like apache spark and hadoop. We have a installed Cloudera CDH 5.3. From the cloudera manager we choose to install spark. Spark is up and running very well in one of the nodes in the cluster. From my…
Fanooos
  • 2,718
  • 5
  • 31
  • 55