Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

votes

1 answer

What is the Ideal instance type for hadoop namenode

For a relatively small, one terabyte cluster ( 2TB actual after replication ) I was trying to nail down what the namenode's ideal memory/cpu size would be, having worked with hadoop off and on as an end-user I can't imagine it being too crazy... but…

hadoop specifications

asked Oct 09 '11 at 14:44

David

votes

1 answer

How do I grant a user permission to use Hadoop via Kerberos?

I've setup Hadoop to use Kerberos (following the Cloudera security guide), but it is unclear how I connect to hadoop with regular users (e.g. username=myuser). Currently I have myself authenticated with Kerberos with my Keberos admin user (via…

security configuration kerberos hadoop

asked Sep 13 '11 at 15:34

Dolan Antenucci

votes

1 answer

Hadoop + NAT scenario

I have a situation where I'd like to run Hadoop spread across 2 clusters. The first cluster (ClusterA) is normal and all nodes are publicly accessible. The second cluster (ClusterB) is behind a NAT. Nodes in ClusterA will be running both Mapred…

nat hadoop

asked Jul 16 '11 at 19:17

BigChief

votes

1 answer

Hadoop streaming job on EC2 stays in "pending" state

Trying to experiment with Hadoop and Streaming using cloudera distribution CDH3 on Ubuntu. Have valid data in hdfs:// ready for processing. Wrote little streaming mapper in python. When I launch a mapper only job using: hadoop jar…

amazon-ec2 hadoop

asked Jun 27 '11 at 22:25

liamf

votes

2 answers

Adding smaller nodes to pseudo-distributed nutch/hadoop cluster

I have nutch/hadoop pseudo distributed running fine. I want to add processing capacity by adding new nodes which are smaller than master (HD 3 times smaller) and cheaper of course. Since the default HDFS replication is at 3, after balancing the data…

hadoop nutch

asked May 21 '11 at 09:54

millebii

votes

1 answer

Datanode not showing in WEB interface

Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll here. The datanode has datanode & tasktracker running (jps command show them). However in the WEB UI I only see one node for the DFS Live Node : 1 Dead Node :…

hadoop

asked Jun 01 '11 at 06:48

millebii

votes

1 answer

Need help to build a strategy

I am a Junior System Administrator with one of the Engineering Schools. One of the Professors got a donation of 45 servers (Dell Poweredge 1690) from Yahoo. Following are his requirements: hadoop (mapreduce) on Linux (which flavor of Linux and…

virtualization vmware-esxi cluster hadoop

asked May 06 '11 at 22:33

Anup

votes

2 answers

How to use combined CPU/Memory power of a Windows cluster

I have 5 Windows machines (dual-core, 3GB) in a LAN all joined to a domain. I have a program which needs 8-cores and 10 GB to run in a given SLA time. What platform/tool can i use to harness the combined CPU/memory and other resources of these…

windows hadoop distributed-computing

asked Apr 08 '11 at 09:15

Munish Goyal

votes

1 answer

Compiling hdfs-fuse bundled with Hadoop

I am trying to compile the hdfs-fuse extension from Hadoop 0.20.2 on a machine running Fedora 14. Below are the packages I have installed: fuse-2.8.5-2.fc14.x86_64 fuse-libs-2.8.5-2.fc14.x86_64 fuse-devel-2.8.5-2.fc14.x86_64 Then, I have…

hadoop fuse hdfs

asked Feb 02 '11 at 19:11

Laurent

votes

1 answer

Can overriding of -Xmx be prevented for hadoop jobs?

I have a shared cluster running Hadoop-0.20.2. Occasionally users don't realize that the default memory settings chosen are based on the amount of available memory. Can I enforce a maximum value for Xmx?

hadoop

asked Dec 06 '10 at 16:34

Dan R

2,335
2
19
28

votes

0 answers

ZooKeeper error ; unrecognized host name for local configuration

I am using Kylin 4+ and want to use Windows and run it locally (without Hadoop). I follow this tut in their website which states that zookeeper config must be set to local like so: kylin.env.zookeeper-is-local=true Which supposes that Kylin won't…

hadoop zookeeper

asked Jul 31 '23 at 21:42

Martin Moore

votes

1 answer

Does VM machine can replace physical machine,

We have 254 Physical servers when all machines are DELL servers R740. servers are part of Hadoop cluster. most of them are holding HDFS filesystem and data node & node manager services, part of them are Kafka machines. The OS that installed on the…

redhat virtual-machines dell hadoop hdfs

asked Jul 25 '23 at 15:33

King David

votes

0 answers

How to force Hadoop Daemon or JVM to use given hostname instead of nodes actual hostname

0 I have 5 nodes hadoop cluster with different fqdns with domain xyz.com like node1.xyz.com, node2.xyz.com ... node5.xyz.com, its hostnames are configured with this domains, so if we write hostname command inside linux terminal it returns…

java reverse-dns hadoop mitkerberos jvm

asked Apr 05 '23 at 19:25

Uddhav Savani

votes

1 answer

Clear RAM Memory Cache and buffer on production Hadoop cluster with HDFS filesystem

we have Hadoop cluster with 265 Linux RHEL machines. from total 265 machines, we have 230 data nodes machines with HDFS filesystem. total memory on each data-node is 128G and we run many spark applications on these machines. last month we added…

redhat memory kernel memcached hadoop

asked Mar 09 '23 at 21:12

King David

votes

1 answer

Hadoop datanodes Using "{Hostname}/{IP address}:9000" to try to connect to nameNode

I have a cluster of Pis that I'm using to experiment with Hadoop. masternode is set to .190, p1 to 191 ... p4 to 194. All nodes are up and running. start-dfs.sh, stop-all.sh, etc from the master successfully start and stop the datanodes. However, on…

hadoop

asked Oct 09 '22 at 07:55

Snap E Tom

Prev 1 2 3

…

17 18 Next