Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
1
vote
0 answers

How to debug policy enforcement on YARN queues?

I have a HDP 3.1 cluster and it seems that the fair policy isn't behaving as expected or YARN is misconfigured, since some users/applications/jobs are consuming more resources than we supposed it to use. So how do I debug/monitor YARN in a way that…
jguilhermemv
  • 111
  • 2
1
vote
0 answers

Backup and Restore strategy in Hbase cluster

I have just started with Hbase cluster. I have a Hbase cluster with 2 master nodes and 4 slave nodes. I have one hbase table where huge data is populated everyday so the disk gets filled quickly. I would like to implement a backup and restore…
Juvenik
  • 111
  • 3
1
vote
2 answers

NoNode for HBase master pseudodistributed mode

I am using Ubuntu 18.04, hadoop 3.1.3 and hbase 2.2.1 To me it seems like my hadoop and HBase are not configured correctly to interact. When I through the HBase shell try to create a table it yields me with following error ERROR: KeeperErrorCode =…
0
votes
1 answer

mkfs + xfs + what is the right mkfs cli in order to create xfs file-system on huge disk

We need to create xfs file-system on kafka disk The special thing about kafka disk is the disk size kafka disk have 20TB size in our case I not sure about the following mkfs , but I need advice to understand if the following cli , is good enough to…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

is it possible mix different RHEL OS version in hadoop cluster?

we are using the following HDP cluster with ambari , list of nodes and their RHEL version 3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2 312 DATA-NODES machines , installed on RHEL 7.2 5 kafka machines , installed…
shalom
  • 461
  • 13
  • 29
0
votes
0 answers

Any benefits of ZFS over EXT4 for data stream processing on top of HDFS?

I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks. I have already…
HUSMEN
  • 1
  • 2
0
votes
0 answers

Request Time Out / Sessions Stalling through IPTABLE (DNAT)

Scenario: Customer recently Migrated Clustered HANA DB Servers to Azure Cloud Platform but these are Physical Servers on Azure (Offering: Azure HLI). Usually these HLIs (HANA DB Servers) in Azure cannot be accessible directly, even not from Azure…
Ram Too
  • 11
  • 2
0
votes
1 answer

transferring data between two hadoop clusters without direct network connectivity

I have a need to transfer data fairly regularly (on demand, not scripted / streamed) between two independent hadoop clusters. One of which is deployed in an isolated network and has no direct access to another. I tried searching the official…
0
votes
1 answer

hadoop + can we install zookeeper servers on kafka hosts

we want to dedicated the zookeeper servers only for kafka machines so each kafka machine include the zookeeper server and zookeeper server will serve only the kafka host and not other application in that case is it ok?
shalom
  • 461
  • 13
  • 29
0
votes
2 answers

High Active(file) Memory Usage in Oracle Linux VMs

I recently searched and read lots of posts and questions about Linux memory management but I can't find my case. For example, there is a question in Unix StackExchange about High memory usage but no process is using it. In this post, the accepted…
0
votes
1 answer

HDFS balancing , how to balanced hdfs data?

we have Hadoop version - 2.6.4 On the datanode machine we can see that hdfs data isn’t balanced On some disks we have different used size as sdb 11G and sdd 17G /dev/sdd 20G 3.0G 17G 15% /grid/sdd /dev/sdb 20G 11G 9.3G 53% /grid/sdb <-- WHY…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

Install Nvidia Drivers 9.0 for TensorFlow pip (Debian 9.7)

I installed Nvidia drivers 9.1 on my Debian 9.7 (Dataproc) when I try to run TensorFlow 1.9 via this test script it fails: Used this guide to install GPU Drivers: https://cloud.google.com/dataproc/docs/concepts/compute/gpus Used pip install…
gogasca
  • 343
  • 2
  • 15
0
votes
1 answer

How to configure Kerberos authentication on the browsers which are on CITRIX page?

We are connecting to our secure client network via CITRIX. We are using chrome to open all quick links. like ambari etc. They open and we are good there, but other useful links like RM and HISTORY server links, do not open as it needs kerberos…
akash sharma
  • 103
  • 2
0
votes
0 answers

Avoid kafka disk to became 100% used by Cron job

We want to suggest the following based on our issues on kafka disks We have many HDP clusters ( based on ambari , and all machines are redhat version 7.2 ) Each cluster include 3 kafka machines , while each kafka include disk with ~15 T Because we…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

Free Account Azure version HDInsight and cores issue

I am using an Free Azure account version and I am trying to create the resources needed to put in place HDInsight. I have done it twice, but in order to spare the time/money I have available, I have deleted the resource group. Unortunately now that…
Nicola
  • 1