Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions
0
votes
1 answer

HBase ImportTsv tuning

I'm on Cloudera 5.16 with Hadoop 2.6. I use ImportTsv to load big csv files into HBase. hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=';' -Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:age mynamespace:mytable…
Eric C
  • 165
  • 12
0
votes
1 answer

Can not connect to WebHDFS by port 14000 in Cloudera Manager

I have a Cloudera (the version is cdh6.2.0) cluster and every components(HDFS, HIVE etc.) worked well. However, recently I want to connect to WebHDFS, I found the port(14000) was not running at all, by executing command netstat -antpl|grep 14000 on…
DennisLi
  • 3,915
  • 6
  • 30
  • 66
0
votes
1 answer

Running Spark history Server at Context localhost:18080/sparkhistory instead at port localhost:18080

I want to run Spark history server at localhost:18080/sparkhistory instead at localhost:18080. The end goal is to access Spark History Server with a domain name i.e, domainname/sparkhistory is there any hacks or spark config options?
0
votes
1 answer

PySpark Unable to read csv from hdfs: HiveExternalCatalog error

I'm new to spark and I'm stuck trying to debug an error. I'm trying to read multiple files from hdfs. I'm using sparksession.read.csv for this but getting an error: py4j.protocol.Py4JJavaError: An error occurred while calling o64.csv. :…
0
votes
2 answers

IOException during splitting java.util.concurrent.ExecutionException: java.io.FileNotFoundException when loading HFile to HBase

I am trying to bulk load data into hbase using the salted table approach as stated in this site: https://www.opencore.com/blog/2016/10/efficient-bulk-load-of-hbase-using-spark/. While I am able to insert data but at random times I get ERROR…
Kok-Lim Wong
  • 103
  • 1
  • 10
0
votes
1 answer

HDFS Datanode crashes with OutOfMemoryError

I´m having repeated crashes in my Cloudera cluster HDFS Datanodes due to an OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space Dumping heap to /tmp/hdfs_hdfs-DATANODE-e26e098f77ad7085a5dbf0d369107220_pid18551.hprof ... Heap dump file…
Victor
  • 2,450
  • 2
  • 23
  • 54
0
votes
1 answer

Is there a way to invalidate metadata and rebuild index from python code in CDSW?

I am using Impyla and Python in the CDSW to query data in HDFS and use it. The problem is sometimes to get all of the data I have to go in and manually click on the "Invalidate all metadata and rebuild index" button in HUE. Is there a way to do…
sectechguy
  • 2,037
  • 4
  • 28
  • 61
0
votes
0 answers

How to set LIVY_CONF_DIR in cloudera

I have installed livy server in cloudera in /usr/share. I want to change set the LIVY_CONF_DIR so that i can manage the config files like log4j.properties. Cloudera says this is possible but i could not find how to define…
0
votes
0 answers

Hive is not creating reducers for "Insert overwrite" query. Small files problem?

I am using Hive with MapReduce. I have tried to use a few different configurations (always the same, but using different values). It is creating some mappers, but no reducers. The configurations that I have set are (I have tried the numeric values…
0
votes
0 answers

CDH6.1: GOT Error creating database Manager sqlserver

I can't import data from SQL Server to HDFS using sqoop. I try with this sentence: sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDrive --connect "jdbc:sqlserver://xxxxx:1433;database=xxxxx" --connection-manager directive --table…
AndresSan
  • 11
  • 1
0
votes
0 answers

Submit Oozie Job using API against kerberized Cloudera

I am trying to submit an oozie workflow (workflow.xml definition is present on hdfs) using the oozie server API against a Kerberized Cloudera. I cannot submit an existing workflow using api due to auth/config issues. I Took inspiration by the…
Roberto G.
  • 171
  • 5
  • 12
0
votes
0 answers

FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. com/yammer/metrics/core/MetricsRegistry

We facing some issue in beeline while we connecting via beeline to hbase table. We have two hiveserver2, one of the node we got this error like: INFO : Query ID = hive_20190719154444_babd2ce5-4d41-400b-9be5-313acaffc9bf INFO : Total jobs = 1 INFO …
0
votes
2 answers

Is it possible to read Excel file from Apache Zeppellin to PySpark or to a Pandas Dataframe?

I have got a file in HDFS (/user/username/Project/data/file.xlsx) that I want to read into a DataFrame. (I do not care if it is a PySpark DataFrame or Pandas, but Pandas is preferred.) I am using a Zeppelin Notebook to do my code. Is it possible to…
0
votes
1 answer

Where is spark/pyspark saving my parquet files?

I'm saving a dataframe in pyspark to a particular location, but cannot see the file/files in the directory. Where are they? How do I get to them out side of pyspark? And how do I delete them? And what is it that I am missing about how spark works?…
EddyTheB
  • 3,100
  • 4
  • 23
  • 32
0
votes
0 answers

Typing issue in HUE

If I am typing one letter in HUE IDE in hive notebook, multiple characters are getting typed automatically. As the screenshot is given below. Could you please help me on this.
1 2 3
99
100