Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
28
votes
3 answers

What is Hadoop and what is it used for?

I have been enjoying reading ServerFault for a while and I have come across quite a few topics on Hadoop. I have had a little trouble finding out what it does from a global point of view. So my question is quite simple : What is Hadoop ? What does…
Antoine Benkemoun
  • 7,314
  • 3
  • 42
  • 60
13
votes
4 answers

In Hadoop, how to show current process of -copyFromLocal

I am still a newbie learner of Hadoop, and this time I was trying to process a 106GB file. I used -copyFromLocal to copy that big file to my Hadoop DFS, but since the file is big I have to wait for a long time without a clue about the current…
Bang Dao
  • 233
  • 2
  • 6
11
votes
2 answers

Moving the SecondaryName Node in a Cloudera HBase Cluster

I deployed the secondary namenode on the same machine is my main namenode: This is wrong for performance and durability reasons (the secondary name node isn't a hot spare, but it does have a copy of needed metadata). I have found documentation on…
Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
9
votes
3 answers

Best choice for NTP client configuration

Lets see if someone can throw a bit of light on this subject.. I'm making a server installation in the next days. My client wants to deploy a Hortonworks HDP with 2 servers as master servers and 5 workers servers. One of the requirements for all of…
lgg
  • 141
  • 2
  • 11
9
votes
4 answers

DIY Hadoop Cluster - Heat & Dust issues?

Following are links of my DIY 6-Node Hadoop Cluster using i3 Machines, What is the best possible way to protect my design from dust & provide better heat transfer? What should I use to cover four side of my rack in order to protect it from dust?
yogesh.panchal
  • 103
  • 1
  • 6
9
votes
4 answers

Hadoop JBOD disk configuration on HP Smart Array 410/i disk controller

I'm in a evaluation phase of some hw that could be used for setting up a hadoop cluster. This hw is refurbished (hp G6 servers w/ Smart Array 410/i controller) and probably we should/must use it... we haven't it yet. I've read that 410/i controller…
nysalsa
  • 91
  • 1
  • 1
  • 2
8
votes
1 answer

Could not start ZK at requested port of 2181, while export HBASE_MANAGES_ZK=false

Problem The first aim was to run HBase standalone. Navigating to ip:60010/master-status is succesfull once HBase has been started. The second aim is to run a distinct ZooKeeper quorum. ZooKeeper has been downloaded and has been started: netstat…
030
  • 5,901
  • 13
  • 68
  • 110
8
votes
1 answer

Is it possible to Managing 20 TB data using MySQL?

I am working in a project and my job is to build a database system to manage about 60,000,000,000 data entries. The project background is I have to do real-time storage for large number of messages that read from about 30,000 RFID readers every…
lemuria
7
votes
1 answer

Set up a Windows 10 Client for a Linux KDC Realm

I set up a KDC Server and created a Realm EXAMPLE.COM. Here is my krb5.conf file: [libdefaults] renew_lifetime = 7d forwardable = true default_realm = EXAMPLE.COM ticket_lifetime = 24h dns_lookup_realm = false dns_lookup_kdc = false …
D. Müller
  • 251
  • 1
  • 2
  • 8
7
votes
2 answers

Hadoop HDFS Backup & DR Strategy

We are preparing to implement our first Hadoop cluster. As such we are starting out small with a four node setup. (1 master node, and 3 worker nodes) Each node will have 6TB of storage. (6 x 1TB disks) We went with a SuperMicro 4-node chassis so…
Matt Keller
  • 221
  • 4
  • 7
7
votes
1 answer

Can a hadoop job be paused or suspended?

I'm using hadoop-0.20.2. Looking at hadoop fs. I am able to kill or fail an individual task. Is there anyway to pause it so that the map slots are freed up for another task?
Dan R
  • 2,335
  • 2
  • 19
  • 28
6
votes
0 answers

Spark Error: Failed to Send RPC to Datanode

We have quite few issues with our Spark Thrift server. It is a new Ambari cluster and no Spark jobs are running now. From the log we can see an error message: Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149 Please advice why this…
shalom
  • 461
  • 13
  • 29
6
votes
2 answers

Hadoop HDFS: set file block size from commandline?

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks. I've done this before within a…
BigChief
  • 398
  • 1
  • 2
  • 12
6
votes
2 answers

Hadoop disk fail, what do you do?

I would like to know about your strategies on what to do when one of the Hadoop server disk fails. Let's say, I have multiple (>15) Hadoop servers and 1 namenode, and one from 6 disks on slaves stops working, disks are connected via SAS. I don't…
wlk
  • 1,713
  • 3
  • 14
  • 19
5
votes
1 answer

Issues Programmatically Adding ODBC DSN to ODBC Administrator on Windows

I am working on trying to automate some configuration, and as part of that we need to add an ODBC DSN through a script. The driver I'm trying to use is the Cloudera Impala ODBC Connector, downloaded from here. All of the machines this will run on…
Dave McGinnis
  • 153
  • 1
  • 12
1
2 3
17 18