Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions

votes

1 answer

HBase ImportTsv tuning

I'm on Cloudera 5.16 with Hadoop 2.6. I use ImportTsv to load big csv files into HBase. hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=';' -Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:age mynamespace:mytable…

import hbase cloudera bulkinsert

asked Aug 22 '19 at 15:26

Eric C

votes

1 answer

Can not connect to WebHDFS by port 14000 in Cloudera Manager

I have a Cloudera (the version is cdh6.2.0) cluster and every components(HDFS, HIVE etc.) worked well. However, recently I want to connect to WebHDFS, I found the port(14000) was not running at all, by executing command netstat -antpl|grep 14000 on…

hadoop hdfs cloudera webhdfs

asked Aug 20 '19 at 08:13

DennisLi

3,915
6
30
66

votes

1 answer

Running Spark history Server at Context localhost:18080/sparkhistory instead at port localhost:18080

I want to run Spark history server at localhost:18080/sparkhistory instead at localhost:18080. The end goal is to access Spark History Server with a domain name i.e, domainname/sparkhistory is there any hacks or spark config options?

apache-spark apache-spark-sql cloudera databricks apache-spark-2.0

asked Aug 20 '19 at 00:52

Sandish Kumar H N

votes

1 answer

PySpark Unable to read csv from hdfs: HiveExternalCatalog error

I'm new to spark and I'm stuck trying to debug an error. I'm trying to read multiple files from hdfs. I'm using sparksession.read.csv for this but getting an error: py4j.protocol.Py4JJavaError: An error occurred while calling o64.csv. :…

hive pyspark cloudera

asked Aug 13 '19 at 10:18

abhinavchat

votes

2 answers

IOException during splitting java.util.concurrent.ExecutionException: java.io.FileNotFoundException when loading HFile to HBase

I am trying to bulk load data into hbase using the salted table approach as stated in this site: https://www.opencore.com/blog/2016/10/efficient-bulk-load-of-hbase-using-spark/. While I am able to insert data but at random times I get ERROR…

scala apache-spark hbase cloudera

asked Aug 09 '19 at 13:52

Kok-Lim Wong

votes

1 answer

HDFS Datanode crashes with OutOfMemoryError

I´m having repeated crashes in my Cloudera cluster HDFS Datanodes due to an OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space Dumping heap to /tmp/hdfs_hdfs-DATANODE-e26e098f77ad7085a5dbf0d369107220_pid18551.hprof ... Heap dump file…

java hadoop hdfs cloudera

asked Aug 07 '19 at 22:33

Victor

2,450
2
23
54

votes

1 answer

Is there a way to invalidate metadata and rebuild index from python code in CDSW?

I am using Impyla and Python in the CDSW to query data in HDFS and use it. The problem is sometimes to get all of the data I have to go in and manually click on the "Invalidate all metadata and rebuild index" button in HUE. Is there a way to do…

python cloudera impala impyla

asked Aug 01 '19 at 16:30

sectechguy

2,037
4
28
61

votes

0 answers

How to set LIVY_CONF_DIR in cloudera

I have installed livy server in cloudera in /usr/share. I want to change set the LIVY_CONF_DIR so that i can manage the config files like log4j.properties. Cloudera says this is possible but i could not find how to define…

hadoop cloudera cloudera-manager livy

asked Aug 01 '19 at 06:26

Rakesh Bharadwaj

votes

0 answers

Hive is not creating reducers for "Insert overwrite" query. Small files problem?

I am using Hive with MapReduce. I have tried to use a few different configurations (always the same, but using different values). It is creating some mappers, but no reducers. The configurations that I have set are (I have tried the numeric values…

hive mapreduce cloudera

asked Jul 29 '19 at 09:06

Antonio Barroso

votes

0 answers

CDH6.1: GOT Error creating database Manager sqlserver

I can't import data from SQL Server to HDFS using sqoop. I try with this sentence: sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDrive --connect "jdbc:sqlserver://xxxxx:1433;database=xxxxx" --connection-manager directive --table…

sql-server hadoop sqoop cloudera

asked Jul 25 '19 at 21:05

AndresSan

votes

0 answers

Submit Oozie Job using API against kerberized Cloudera

I am trying to submit an oozie workflow (workflow.xml definition is present on hdfs) using the oozie server API against a Kerberized Cloudera. I cannot submit an existing workflow using api due to auth/config issues. I Took inspiration by the…

curl kerberos cloudera oozie

asked Jul 23 '19 at 13:01

Roberto G.

votes

0 answers

FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. com/yammer/metrics/core/MetricsRegistry

We facing some issue in beeline while we connecting via beeline to hbase table. We have two hiveserver2, one of the node we got this error like: INFO : Query ID = hive_20190719154444_babd2ce5-4d41-400b-9be5-313acaffc9bf INFO : Total jobs = 1 INFO …

hive hbase cloudera cloudera-cdh cloudera-manager

asked Jul 19 '19 at 14:18

Praveen Prakasan

votes

2 answers

Is it possible to read Excel file from Apache Zeppellin to PySpark or to a Pandas Dataframe?

I have got a file in HDFS (/user/username/Project/data/file.xlsx) that I want to read into a DataFrame. (I do not care if it is a PySpark DataFrame or Pandas, but Pandas is preferred.) I am using a Zeppelin Notebook to do my code. Is it possible to…

apache-spark pyspark cloudera

asked Jul 18 '19 at 14:55

Antonio Barroso

votes

1 answer

Where is spark/pyspark saving my parquet files?

I'm saving a dataframe in pyspark to a particular location, but cannot see the file/files in the directory. Where are they? How do I get to them out side of pyspark? And how do I delete them? And what is it that I am missing about how spark works?…

python-3.x apache-spark pyspark cloudera

asked Jul 18 '19 at 08:11

EddyTheB

3,100
4
23
32

votes

0 answers

Typing issue in HUE

If I am typing one letter in HUE IDE in hive notebook, multiple characters are getting typed automatically. As the screenshot is given below. Could you please help me on this.

cloudera hue

asked Jul 17 '19 at 15:07

Priyaranjan Swain

Prev 1 2 3

…

100 Next