Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

votes

1 answer

How to add multiple hostnames in private DNS zone in Azure to resolve hostnames for VNET?

I have an AKS (Azure Kubernetes cluster) that is on a VNET (Azure Virtual Network) that needs to connect to multiple On-prem hadoop machines to read/write data. I have a private DNS zone connected to VNET to resolve hostnames to IP - I tested with a…

asked Apr 10 '22 at 01:40

Venkatesh Gotimukul

votes

1 answer

Hadoop recommissioning datanode

Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?

hadoop hdfs

asked Feb 19 '21 at 09:36

Guido Aulisi

votes

1 answer

what's the meaning of Requested resource= in hadoop web ui?

I want to insert some data in to hive table. but it is stuck. So I go to the hadoop web ui and find the following information: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource…

hadoop

asked Dec 18 '20 at 13:05

appleyuchi

votes

1 answer

Change HDFS replication factor

I've changed replication factor from 3 to 2 for some directories with command: hdfs dfs -setrep -R 2 /path/to/dir but my HDFS free space still the same. Should I do something else to free my disks?

filesystems hadoop distributed-filesystems hdfs

asked Sep 29 '20 at 11:16

John Brown

votes

1 answer

HDFS. How to free 1 particular disk

I have cluster with 3 servers. 2 of them have 2 TB disks and another one have 500 Gb SSD. I am trying to use balancer, but I still get 70% of usage on 2TB disks and 99% on 500Gb due to non-dfs files. Replication coefficient=2. Is it possible to free…

disk-space-utilization hadoop hdfs

asked Sep 16 '20 at 08:44

John Brown

votes

1 answer

Hadoop Cluster Capacity Planning of Data Nodes for disks per data node

we are planing to build hadoop cluster with 12 data nodes machines when the replication factor is 3 and DataNode failed disk tolerance - 1 data nodes machines are include the disks for HDFS since we not found the criteria for how many disks need…

redhat hard-drive hadoop hdfs

asked Aug 02 '20 at 20:51

King David

votes

1 answer

What is the default password of hive

With local console, typing "hive" launching the console directly without any password. But, when I try to connect using dbeaver/beeline, it prompts for username/password. I tried with hive/"" ""/"" mysql metastore username/password. entries…

hadoop

asked Jul 13 '20 at 15:30

Uday Kiran Reddy

votes

1 answer

Optimal RAID configuration for EC2 instance store used for HDFS

I'm trying to determine if there is any practical advantage to configuring a RAID array on the instance store of a 3x d2.2xlarge instances being used for HDFS. Initially I planned to just mount each store and add it as an additional data directory…

amazon-ec2 raid hadoop hdfs amazon-ephemeral

asked Jun 25 '20 at 22:49

John R

votes

1 answer

Users for Hadoop deployment

I followed instructions found online to install and configuring a "2 name nodes" + "10 data nodes" hadoop cluster on CentOS 8. I created a wheel user called "hadoop" on all nodes and setup passwordless ssh under this user. The install and…

hadoop

asked Jan 28 '20 at 20:59

Root Loop

-1

votes

1 answer

Unable to ssh localhost without password despite proper perms, key in authorized_keys

I have a key ~/.ssh/id_rsa and I added the pub key to my authorized keys: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys I also changed my permissions to 600: sudo chmod 600 ~/.ssh/authorized_keys I checked and /etc/ssh/sshd_config is set for…

linux ssh localhost opensuse hadoop

asked Aug 12 '17 at 23:08

Don Rhummy

-1

votes

1 answer

click and deploy hadoop cluster on google cloud platform

I am new to Google Cloud. I tried to click-and-deploy Hadoop cluster and I am always told that my quota are limited. But some days ago, I upgraded my free trial to a paid one ! (how can I check this in order to be sure that it was taken into account…

hadoop google-cloud-platform google-compute-engine

asked Jul 12 '15 at 13:24

epsilones

-1

votes

1 answer

Hadoop on Openstack vs physical servers

I'm new to Hadoop and trying to understand how it should be installed/configured. From the documentation I see that Hadoop normally should be aware about physical servers configuration (e.g replicating data between racks). So it is not clear for me,…

openstack hadoop

asked Feb 11 '15 at 10:48

Pavel

-1

votes

2 answers

"sudo apt-get remove hadoop" is not removing package

I am trying to uninstall Cloudera Hadoop from my Ubuntu System. For this I tried sudo apt-get remove hadoop command but this command is failing with following message: ubuntu@ip-10-82-19-71:~/cluster-deployer/src$ sudo apt-get remove hadoop Reading…

ubuntu-12.04 apt hadoop uninstall

asked Dec 29 '14 at 09:43

Shekhar

-1

votes

2 answers

how do I install pdsh on centos 6?

I'm following a tutorial on how to configure a centos machine to be a node in a hadoop cluster for HortonWorks. I'm doing this on a virtual machine on VirtualBox. Sadly, since I am a linux beginner, I am stuck on some very basic steps: 2.4.…

centos virtualization bash virtualbox hadoop

asked Jun 17 '13 at 20:40

Alex Gordon

-1

votes

2 answers

RHEL + can we improve disks performance by tuning kernel parameters?

we have Hadoop cluster and we are collection metrics collection data in order to investigate slowness behavior on spark applications after long investigation on our Hadoop cluster we noticed from Prometheus metrics point that node_disk_io_now is…

redhat hard-drive kernel hadoop prometheus

asked Jun 15 '22 at 17:05

King David

Prev 1 2 3

…

18 Next