Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions

votes

3 answers

Livy Server on Amazon EMR hangs on Connecting to ResourceManager

I'm trying to deploy a Livy Server on Amazon EMR. First I built the Livy master branch mvn clean package -Pscala-2.11 -Pspark-2.0 Then, I uploaded it to the EMR cluster master. I set the following…

apache-spark hadoop-yarn cloudera emr

asked Oct 28 '16 at 20:10

matheusr

votes

7 answers

hdfs - ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:

I am trying to use the below to list my dirs in hdfs: ubuntu@ubuntu:~$ hadoop fs -ls hdfs://127.0.0.1:50075/ ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match…

hadoop hdfs cloudera

asked May 04 '13 at 09:57

Tampa

75,446
119
278
425

votes

1 answer

Cloudera 5.6: Parquet does not support date. See HIVE-6384

I am currently using Cloudera 5.6 trying to create a parquet format table in hive table based off another table, but I am running into an error. create table sfdc_opportunities_sandbox_parquet like sfdc_opportunities_sandbox STORED AS…

hive cloudera parquet

asked May 20 '16 at 22:55

pitchblack408

2,913
4
36
54

votes

1 answer

Spark java.io.EOFException: Premature EOF: no length prefix available

I am trying to read parquet file and perform some operations on it and save the result as parquet on HDFS. I am doing it using Spark. While doing so I am getting following exception. java.io.EOFException: Premature EOF: no length prefix available at…

hadoop apache-spark hdfs cloudera

asked Apr 22 '16 at 11:25

Aditya Calangutkar

votes

2 answers

Elaboration on why shuffle write data is way more then input data in apache spark

Can anyone elaborate to me what exactly Input, Output, Shuffle Read, and Shuffle Write specify in spark UI? Also, Can someone explain how is input in this job is 25~30% of shuffle write? As per my understanding, shuffle write is sum of temporary…

apache-spark hdfs cloudera

asked Mar 29 '16 at 10:41

Abhishek Anand

1,940
14
27

votes

2 answers

zookeeper client does not provide CLI with "jline support is disabled" message

I just brought up CDH 5.4 and installed zookeeper. I used zkCli successfully many times before. This time the command line launch stops before getting to the prompt with Welcome to ZooKeeper! JLine support is disabled 2015-05-04 18:18:33,936 [myid:]…

java hadoop cloudera apache-zookeeper jline

asked May 05 '15 at 03:43

bhomass

3,414
8
45
75

votes

3 answers

error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol

I'm currently trying to test the implemented changes for achieving security with Encrypted Shuffle in Cloudera Hadoop Environment. I've created the certificates and keystores and kept them in appropriate locations. I'm testing TaskTracker's HTTPS…

java ssl hadoop openssl cloudera

asked Jan 15 '14 at 11:01

Saurabh Gokhale

53,625
36
139
164

votes

3 answers

Unable to connect to Hive2 using Python

While connecting to Hive2 using Python with below code: import pyhs2 with pyhs2.connect(host='localhost', port=10000, authMechanism="PLAIN", user='root', password='test', database='default') as…

python python-3.x hadoop hive cloudera

asked Aug 27 '17 at 11:30

Vinod

votes

1 answer

TIMESTAMP format issue in HIVE

I have Hive table created from JSON file. CREATE external TABLE logan_test.t1 ( name string, start_time timestamp ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH SERDEPROPERTIES ( "timestamp.formats" =…

hadoop hive hiveql cloudera create-table

asked Jun 09 '17 at 22:00

logan

7,946
36
114
185

votes

4 answers

Invalid URI for NameNode address

I'm trying to set up a Cloudera Hadoop cluster, with a master node containing the namenode, secondarynamenode and jobtracker, and two others nodes containing the datanode and tasktracker. The Cloudera version is 4.6, the OS is ubuntu precise x64.…

hadoop hdfs cloudera

asked May 14 '14 at 05:29

cybertextron

10,547
28
104
208

votes

8 answers

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr

I recently upgraded my cluster from Apache Hadoop1.0 to CDH4.4.0. I have a weblogic server in another machine from where i submit jobs to this remote cluster via mapreduce client. I still want to use MR1 and not Yarn. I have compiled my client…

hadoop mapreduce cloudera

asked Sep 27 '13 at 06:21

RGC

votes

2 answers

Running wordcount sample using MRV1 on CDH4.0.1 VM

I downloaded the VM from https://downloads.cloudera.com/demo_vm/vmware/cloudera-demo-vm-cdh4.0.0-vmware.tar.gz I found that below listed services are running after the system boots. MRV1…

hadoop cloudera

asked Oct 11 '12 at 02:16

Ujjwal Wadhawan

votes

3 answers

How to run HBase shell against a remote cluster

I'm running HBase in pseudo-distributed mode on my workstation. We also have HBase running on a cluster. Using the HBase shell, I'd like to access the HBase instance that's running on the cluster from my workstation. I would like to do this…

configuration hadoop hbase apache-zookeeper cloudera

asked Apr 18 '12 at 23:16

sangfroid

3,733
11
38
42

votes

3 answers

Livy Server: return a dataframe as JSON?

I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body { "code": "spark.sql(\"select * from test_table limit 10\")" } I would like an answer in the following…

json apache-spark cloudera apache-spark-2.0 livy

asked Dec 13 '16 at 17:23

matheusr

votes

3 answers

Apache Spark error : Could not connect to akka.tcp://sparkMaster@

This is our first steps using big data stuff like apache spark and hadoop. We have a installed Cloudera CDH 5.3. From the cloudera manager we choose to install spark. Spark is up and running very well in one of the nodes in the cluster. From my…

hadoop apache-spark cloudera

asked Feb 11 '15 at 12:00

Fanooos

2,718
5
31
55

Prev 1 2

…

99 100 Next