Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
0
votes
3 answers

I/O and RAM limitations are important for Hadoop performance. But is disk speed related to I/O?

Hortonworks says this: "Most often performance of a Hadoop cluster will not be constrained by disk speed – I/O and RAM limitations will be more important." * How is disk speed not related to I/O limitations?
Propulsion
  • 158
  • 2
  • 9
0
votes
1 answer

store database on hadoop cluster

I'm learning Hadoop and Hive server, and I'm confused about something. Suppose I build a hadoop cluster with three machines, and I start storing images with a PHP/MySQL script. Now for a MySQL database can I install Hive on the same Hadoop server or…
0
votes
1 answer

Flume- Error Log while using FileChannel

I am using Flume flume-ng-1.5.0 ( with CDH 5.4) to collect logs from many Servers and Sink to HDFS Here is my configuration : #Define Source , Sinks, Channel collector.sources = avro collector.sinks = HadoopOut collector.channels = fileChannel #…
Summer Nguyen
  • 214
  • 3
  • 10
0
votes
4 answers

Big Data: Which HD Parameters are Important?

I work with a lot of datasets that are in the tens of GBs, usually split into several files. Performing any type of dataside-wide operation (grep, sed, search, read/write to/from databases and Hadoop) on these files is of course very slow and time…
Ryan Rosario
  • 225
  • 2
  • 9
0
votes
1 answer

Setting up Secure Hadoop Cluster - Kerberos security

I setup a HDP 2.2 cluster successfully (1 NM, 3 DNs and 1 client). User accounts to access HDP cluster are created in client and checked these users can submit jobs, by SSH to client node and run sample jobs. In next step I enabled Kerberos…
0
votes
1 answer

Hadoop: How to configure failover time for a datanode

I need to re-replicate blocks on my HDFS cluster in case of a datanode is failing. Actually, this appears to already happen after a period of maybe 10min. However, I want to decrease this time, but wondering how to do so. I tried to set…
frlan
  • 573
  • 1
  • 8
  • 27
0
votes
1 answer

Ambari/Nagios overwrites hadoop-services.cfg file on startup

When I shut down Nagios from the Ambari Web UI, modify the file hadoop-services.cfg, save it and open it, new settings are there. However, when I start the Nagios again (from the Ambari Web UI) and open the file hadoop-services.cfg, changes are…
0
votes
1 answer

Does Active Directory alone is not enough to secure hadoop?

I am trying to secure Hadoop environment installed in windows. So basically I started to analyse how to secure a Unix-based hadoop cluster. Have gone through various links related to Kerberos and other Apache Add-ons(Knox/ Rhino/ Sentry).. Yet to…
Dinesh Kumar P
  • 163
  • 1
  • 6
0
votes
1 answer

How to access Hadoop remotely?

I have installed Hadoop on open-stack CentOS guest VM. I'm able to open the site: (From 192.168.0.10, VM-1) http://localhost:50070 http://192.168.0.10:50070 But not able to access the same from a remote machine (My…
Ibrar Ahmed
  • 101
  • 1
0
votes
1 answer

How to know which script or executable is linked with a metric in ganglia?

I have just started to explore ganglia and my question is "How to know which script or executable is linked with a metric in ganglia?" The fact is that I don't know much about ganglia. I have good experience in zabbix and I want to link a graph in…
Rohit
  • 101
  • 4
0
votes
2 answers

Unable to setup a connection to Amazon EC2 and run pig

I have made an EC2 key pair and saved it to a location under my home directory on mac. Also I have changed permissions with 'chmod 600 /path/to/saved/keypair/file.pem'. Now I have followed the following instructions to run pig on EC2: To set up and…
0
votes
1 answer

Unable to connect to Amazon EC2 and run pig

I have made an EC2 key pair and saved it to a location under my home directory on mac. Also I have changed permissions with 'chmod 600 /path/to/saved/keypair/file.pem'. Now I have followed the following instructions to run pig jobs on EC2: To set…
0
votes
0 answers

Distributing Master node ssh key

For the master node to passwordless-ly ssh into the slaves, the master needs to distribute its ssh key to the slaves. Copying key using ssh-copy-id asks for the user password. If there are hundreds of nodes in the system, it may not be a good idea…
krackoder
  • 151
  • 1
  • 4
0
votes
2 answers

error while running any hadoop hdfs file system command

I am very new to hadoop and referring to the "hadoop for dummies" book. I have a VM with following specs: hadoop version 2.0.6-alpha bigtop os centos The problem is when I run any hdfs file system command I get following error: hadoop hdfs dfs -ls…
Raj Kumar Rai
  • 3
  • 1
  • 1
  • 3
0
votes
0 answers

hadoop hbase environment variables

I tried to setup a 4 node Hadoop cluster using CDH4.7. The cluster is up and running fine and when I submit a word count MR job it completed successfully but when I am submitting an MR job to insert data into HBase it was throwing class not found…
sunny
  • 1