Questions tagged [cloudera]

Cloudera Inc. is a Palo Alto-based enterprise software company which provides Apache Hadoop-based software and services.

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular websites.

Cloudera's Distribution including Apache Hadoop (CDH) is a free package built from the powerful, flexible, scalable Apache Hadoop software. To help you learn about Hadoop and how to use it, Cloudera offers public and private training, certification and online courseware.

Useful Links

Related Tags

2533 questions
0
votes
0 answers

How long does it take to port data from one Cloudera Impala to another?

I am working on a setup similar to the one in the attached picture. I have 2 Cloudera Impala databases, on either side of a DMZ zone. Every month, around 20000 records are written into the non-DMZ Impala and this incremental load needs to be ported…
Sharanya
  • 45
  • 1
  • 8
0
votes
1 answer

Cloudera manager agent fails to install - nothing provides MySQL-python, python, python-psycopg2 needed by cloudera-manager-agent

Hello, I have a CentOS-8 linux cluster and I am trying to install Cloudera 6.2.1 on it. As you can see in the attached picture is failing on cloudera-manager-agent installation due to 3 conflicting requests. First one is MySQL-python. I tried to…
Alin
  • 21
  • 2
0
votes
1 answer

Impala: Get the list of matching partitions

We have an impala table that is partitioned by as year=yyyy/month=mm/day=dd/hour=hh. One of the client applications can send select queries to it with a from and a to date in dd/mm/yyyy format. Now, for eg. if the from date is set to say 01/11/2019…
user_name
  • 119
  • 1
  • 8
0
votes
0 answers

Configuration Hadoop Capacity Scheduler for SAP HANA

Firstly, I would like to ask about the function of Hadoop Capacity Scheduler in SAP HANA? Why SAP Hana need that? Because I just found the steps to "create a dedicated YARN queue for TempletonControllerJobs" in the SAP Help Page when I try to make…
0
votes
0 answers

Describe path in the container - Ansible

I use ansible for running a benchmark which uses hadoop. I have installed Cloudera Hadoop Container. How can I define the path to the core-site.xml file in the container? - name: 'Configuring the benchmark' replace: path:…
Malo
  • 141
  • 3
  • 12
0
votes
1 answer

How to download quickstart VM 5.x for virtual box for windows 10?

How to download quickstart VM 5.x for virtual box for windows 10? I have installed oracle virtual box. But for cloudera qickstart VM I am not getting any source. I have searched a lot in google and youtube but the link or site all are referring is…
0
votes
0 answers

Cloudera spark connection from local machine

from pyspark.sql import SparkSession from pyspark.sql.types import * from pyspark.sql.functions import * sparkdriver=SparkSession.builder.master("spark://:7077").appName("mytryapp").getOrCreate() sparkdriver I'm trying to…
0
votes
1 answer

Impala: Split single row into multiple rows based on Date and time

I want to split a single row into multiple rows based on time. SrNo Employee StartDate EndDate --------------------------------------------------------------------------- 1 emp1 30/03/2020 09:00:00 …
Kaustav
  • 69
  • 3
  • 12
0
votes
3 answers

Is there any difference between HighAvailability (HA) for NameNode and HDFS?

I am getting confused between high availability of HDFS and name node, are these two things one and the same or different?
0
votes
1 answer

Hue Solr Search running slow

While displaying data in Hue from Solr (8000+ columns 30000+ rows) Hue is running very slow. It only has 3 users, and is consuming about 7 Gig of memory. Installed through Docker. Presumably this is due to the volume of data and not fixable? Thanks…
0
votes
1 answer

Hue Solr Marker Map limited to 25

When displaying data in a Hue Dashboard (Data from a Solr Server - not in cloud mode) the marker map is only displaying 25 "markers". Is there any way to adjust this limit? If I filter the data it will adjust to these filters, but again limited to…
0
votes
0 answers

Hue dashboard not loading

I have a Hue instance running, connected to a Solr Server. The standard dashboard (gridster enabled) works perfectly. However, upon disabling gridster, the dashboard will not load; it just has the loading wheel on it. Are there any fixes for this…
0
votes
1 answer

Hive compute median and average by groups

I have a dataset that has counts by state and county and I would like to calculate the median and average by state and county such as: Have: ID state county count 1 MD aa 2 2 MD aa 4 3 VA bb …
lydias
  • 841
  • 1
  • 14
  • 32
0
votes
1 answer

How to view the full exception/error stack trace in Cloudera

I am trying to run a query from Cloudera Hue editor for Hive. However the query fails with an exception (which I am trying to explore). How do I see the full stack trace?
Saikat
  • 14,222
  • 20
  • 104
  • 125
0
votes
2 answers

SQL group by different levels on the same dataset

I have the following dataset, and I hope to create different groups to count the occurrence of values under name. Have: (county is in string) name state county apple MD 1 apple DC 1 pear VA 1 pear VA 2 pear CA …
lydias
  • 841
  • 1
  • 14
  • 32