Questions tagged [mapreduce]
13 questions
4
votes
2 answers
hadoop-config.sh in bin/ and libexec/
While setting up hadoop, I found that hadoop-config.sh script is present in two directories, bin/ and libexec/. Both the files are identical. While looking onto scripts, I found that if hadoop-config.sh is present in libexec, then it gets executed.…

krackoder
- 151
- 1
- 4
4
votes
1 answer
How do I define the timeout for bootstrap actions on Amazon's Elastic MapReduce?
How do I change the timeout for bootstrap actions on Amazon's Elastic MapReduce?
user76542
3
votes
1 answer
Best practice for administering a (hadoop) cluster
I've recently been playing with Hadoop. I have a six node cluster up and running - with HDFS, and having run a number of MapRed jobs. So far, so good. However I'm now looking to do this more systematically and with a larger number of nodes. Our base…
Alex
2
votes
0 answers
Hadoop Streaming with Python 3.5: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
I'm trying to run my own mapper and reducer Python scripts using Hadoop Streaming on my cluster built on VMware Workstation VMs.
Hadoop version - 2.7, Python - 3.5, OS - CentOS 7.2 on all the VMs.
I have a separate machine which plays a role of a…

alex
- 21
- 1
- 3
1
vote
0 answers
Sample output of Rumen or Input to Gridmix
I want to see JobHistory logs, which can be fed as input to the Rumen. More specifically, I am interested in knowing input format for the Gridmix.
I tried following two things for it:
1) I found this files: . What is this file exactly?
Is this…

PHcoDer
- 111
- 2
1
vote
1 answer
Hadoop FileAlreadyExistsException: Output directory hdfs://:9000/input already exists
I have Hadoop setup in fully distributed mode with one master and 3 slaves. I am trying to execute a jar file named Tasks.jar which is taking arg[0] as input directory and arg[1] as output directory.
In my hadoop environment, I have the input files…

Harinarayanan Mohan
- 11
- 1
- 3
1
vote
2 answers
Updating group without log out or subshell
I'm trying to run Docker on Elastic MapReduce streaming but am having trouble with a permissions issue. In my bootstrap script, I need the "hadoop" user to be part of the "docker" group (as described on the AWS Docker Basics page):
sudo usermod -a…

Max
- 111
- 2
1
vote
1 answer
MapReduce job is hung after 1 of 5 reducers completed on single-node environment
I have only one Data Node on my dev environment on EC2. I ran heavy MR job and in 6 hours noticed that 100% of mappers and 20% of reducers finished (1 of reducer shows 100% competition, other ones - 0%). Looks like job is hung between 2 reducer…

Marboni
- 111
- 4
1
vote
0 answers
How to increase the performance on Amazon Elastic Mapreduce for job execution?
My task is:
Initially I want to import the data from MS SQL Server into HDFS using SQOOP.
Through Hive I am processing the data and generating the result in one table
That result containing table from Hive is again exported to MS SQL SERVER…

Bhavesh Shah
- 111
- 2
1
vote
3 answers
Hadoop Rolling Small files
I am running Hadoop on a project and need a suggestion.
Generally by default Hadoop has a "block size" of around 64mb..
There is also a suggestion to not use many/small files..
I am currently having very very very small files being put into HDFS due…

Arenstar
- 3,602
- 2
- 25
- 34
0
votes
1 answer
How to view status of recent AppEngine mapreduce jobs?
We recently upgraded our App Engine application to GAE SDK 1.9, and upgraded the older MapReduce library we'd been using to the most recent version hosted on GitHub. We now find that the old MapReduce status page…

JP Lodine
- 101
- 1
0
votes
0 answers
Distributing Master node ssh key
For the master node to passwordless-ly ssh into the slaves, the master needs to distribute its ssh key to the slaves. Copying key using ssh-copy-id asks for the user password. If there are hundreds of nodes in the system, it may not be a good idea…

krackoder
- 151
- 1
- 4
0
votes
1 answer
MongoDB Locking - Very, very, slow to read
This is the output from db.currentOp():
> db.currentOp()
{
"inprog" : [
{
"opid" : 2153,
"active" : false,
"op" : "update",
"ns" : "",
"query" : {
"name" :…

StuR
- 167
- 2
- 10