Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
0
votes
1 answer

How to add multiple hostnames in private DNS zone in Azure to resolve hostnames for VNET?

I have an AKS (Azure Kubernetes cluster) that is on a VNET (Azure Virtual Network) that needs to connect to multiple On-prem hadoop machines to read/write data. I have a private DNS zone connected to VNET to resolve hostnames to IP - I tested with a…
0
votes
1 answer

Hadoop recommissioning datanode

Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?
0
votes
1 answer

what's the meaning of Requested resource= in hadoop web ui?

I want to insert some data in to hive table. but it is stuck. So I go to the hadoop web ui and find the following information: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource…
0
votes
1 answer

Change HDFS replication factor

I've changed replication factor from 3 to 2 for some directories with command: hdfs dfs -setrep -R 2 /path/to/dir but my HDFS free space still the same. Should I do something else to free my disks?
0
votes
1 answer

HDFS. How to free 1 particular disk

I have cluster with 3 servers. 2 of them have 2 TB disks and another one have 500 Gb SSD. I am trying to use balancer, but I still get 70% of usage on 2TB disks and 99% on 500Gb due to non-dfs files. Replication coefficient=2. Is it possible to free…
0
votes
1 answer

Hadoop Cluster Capacity Planning of Data Nodes for disks per data node

we are planing to build hadoop cluster with 12 data nodes machines when the replication factor is 3 and DataNode failed disk tolerance - 1 data nodes machines are include the disks for HDFS since we not found the criteria for how many disks need…
King David
  • 549
  • 6
  • 20
0
votes
1 answer

What is the default password of hive

With local console, typing "hive" launching the console directly without any password. But, when I try to connect using dbeaver/beeline, it prompts for username/password. I tried with hive/"" ""/"" mysql metastore username/password. entries…
Uday Kiran Reddy
  • 119
  • 1
  • 4
  • 14
0
votes
1 answer

Optimal RAID configuration for EC2 instance store used for HDFS

I'm trying to determine if there is any practical advantage to configuring a RAID array on the instance store of a 3x d2.2xlarge instances being used for HDFS. Initially I planned to just mount each store and add it as an additional data directory…
John R
  • 383
  • 4
  • 13
0
votes
1 answer

Users for Hadoop deployment

I followed instructions found online to install and configuring a "2 name nodes" + "10 data nodes" hadoop cluster on CentOS 8. I created a wheel user called "hadoop" on all nodes and setup passwordless ssh under this user. The install and…
Root Loop
  • 902
  • 4
  • 24
  • 45
-1
votes
1 answer

Unable to ssh localhost without password despite proper perms, key in authorized_keys

I have a key ~/.ssh/id_rsa and I added the pub key to my authorized keys: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys I also changed my permissions to 600: sudo chmod 600 ~/.ssh/authorized_keys I checked and /etc/ssh/sshd_config is set for…
Don Rhummy
  • 403
  • 4
  • 8
  • 16
-1
votes
1 answer

click and deploy hadoop cluster on google cloud platform

I am new to Google Cloud. I tried to click-and-deploy Hadoop cluster and I am always told that my quota are limited. But some days ago, I upgraded my free trial to a paid one ! (how can I check this in order to be sure that it was taken into account…
-1
votes
1 answer

Hadoop on Openstack vs physical servers

I'm new to Hadoop and trying to understand how it should be installed/configured. From the documentation I see that Hadoop normally should be aware about physical servers configuration (e.g replicating data between racks). So it is not clear for me,…
Pavel
  • 1
  • 1
-1
votes
2 answers

"sudo apt-get remove hadoop" is not removing package

I am trying to uninstall Cloudera Hadoop from my Ubuntu System. For this I tried sudo apt-get remove hadoop command but this command is failing with following message: ubuntu@ip-10-82-19-71:~/cluster-deployer/src$ sudo apt-get remove hadoop Reading…
Shekhar
  • 107
  • 2
  • 5
-1
votes
2 answers

how do I install pdsh on centos 6?

I'm following a tutorial on how to configure a centos machine to be a node in a hadoop cluster for HortonWorks. I'm doing this on a virtual machine on VirtualBox. Sadly, since I am a linux beginner, I am stuck on some very basic steps: 2.4.…
Alex Gordon
  • 455
  • 3
  • 14
  • 31
-1
votes
2 answers

RHEL + can we improve disks performance by tuning kernel parameters?

we have Hadoop cluster and we are collection metrics collection data in order to investigate slowness behavior on spark applications after long investigation on our Hadoop cluster we noticed from Prometheus metrics point that node_disk_io_now is…
King David
  • 549
  • 6
  • 20
1 2 3
17
18