Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
2
votes
2 answers

How can I turn off hadoop speculative execution from Java

After reading Hadoop speculative task execution I am trying to turn off speculative execution using the new Java api, but it has no effect. This is my Main class: public class Main { public static void main(String[] args) throws Exception { …
Gavriel
  • 18,880
  • 12
  • 68
  • 105
2
votes
1 answer

FAILED: NullPointerException null in HIVE QUERY

Following is the HIVE query I am using, I am also using a Ranking function. I am running this on my local machine. SELECT numeric_id, location, Rank(location), followers_count FROM ( SELECT numeric_id, location, followers_count FROM…
patz
  • 1,306
  • 4
  • 25
  • 42
2
votes
2 answers

How to know job flow id, other cluster parameters in script running via script-runner.jar

I'm starting an elastic mapreduce cluster with the following command-line: $ elastic-mapreduce \ --create \ --num-instances "${INSTANCES}" \ --instance-type m1.medium \ --ami-version 3.0.4 \ --name "${CLUSTER_NAME}" \ --log-uri…
Gavriel
  • 18,880
  • 12
  • 68
  • 105
2
votes
2 answers

cannot ssh into Elastic MapReduce

I'm using elastic-mapreduce to spun new clusters from the command line. After reading this tutorial, I have: elastic-mapreduce --create --alive \ --instance-type m1.xlarge\ --num-instances 5 \ --supported-product mapr \ --name m7 \ --args…
cybertextron
  • 10,547
  • 28
  • 104
  • 208
2
votes
0 answers

Error: user not authorized to perform: iam:GetInstanceProfile

When trying to create "Interactive Cluster" using , ruby elastic-mapreduce --create --alive --name "Interactive Cluster" --num-instances=1 --master-instance-type=m1.large --hive-interactive I get the following message printed on the…
user1523292
  • 266
  • 1
  • 8
2
votes
2 answers

s3distcp error "Argument '--arg' doesn't match"

I'm trying to use s3distcp for an EMR job and got this exception: Exception in thread "main" java.lang.RuntimeException: Argument --arg doesn't match. at emr.hbase.options.Options.parseArguments(Options.java:75) at…
Thi Duong Nguyen
  • 1,745
  • 2
  • 12
  • 18
2
votes
1 answer

Modifying log4j.properties file on AWS Elastic MapReduce

I'm using AWS Elastic MapReduce and I would like to be able to set the logging level. For example, I would like for log.isDebugEnabled() to return true. A bit of googling led me to find this blog…
Alexander
  • 1,673
  • 4
  • 19
  • 25
2
votes
3 answers

Parse Freebase RDF dump with MapReduce

I downloaded the rdf data dump from Freebase and what I need to extract is the name of every entity in English in Freebase. Do I have to use Hadoop and MapReduce to do this, if so how? Or is there another way to extract the entity names? It would be…
Django Johnson
  • 1,383
  • 3
  • 21
  • 40
2
votes
2 answers

How does one make a hadoop task attempt to fail after too many data fetch failures?

I have a hadoop reduce task attempt that will never fail or get completed unless I fail/kill it manually. The problem surfaces when the task tracker node (due to network issues that I am still investigating) looses connectivity with other task…
scetoaux
  • 398
  • 7
  • 19
2
votes
1 answer

AWS Elastic mapreduce doesn't seem to be correctly converting the streaming to jar

I have a mapper and reducer that work fine when I run them in the piped version: cat data.csv | ./mapper.py | sort -k1,1 | ./reducer.py I used the elastic mapreducer wizard, loaded inputs, outputs, bootstrap, etc. The bootstrap is successful, but…
2
votes
1 answer

How do I specify a S3 bucket as my input to EMR

Instead of copying over to HDFS, is it possible to just get an array of objects in a bucket in S3 to be processed in EMR? I've tried this and I keep on either getting security warnings for not having credentials (even after I add them to the…
Julian
  • 483
  • 1
  • 6
  • 17
2
votes
0 answers

NoSuchMethodError with Netty on Amazon EMR

I am attempting to run a MapReduce job on spot instances using Amazon's EMR service. The intent is to read files off S3, process them in an MR job, and emit rows to a Cassandra DB in the reducer. My custom jar runs fine on a single-node Hadoop…
2
votes
1 answer

What happens if I add the same path twice to a Hadoop?

I am using elastic map reduce. I wonder what will happen if I use the exact same line twice in my main method. FileInputFormat.addInputPath(job, new Path( "s3n://mybucket/data/lolcat/*")); Will hadoop process the same…
Eastern Monk
  • 6,395
  • 8
  • 46
  • 61
2
votes
4 answers

Mapreduce Table Diff

I have two versions (old/new) of a database table with about 100,000,000 records. They are in files: trx-old trx-new The structure is: id date amount memo 1 5/1 100 slacks 2 5/1 50 wine id is the simple primary key, other fields are…
Bill Burcham
  • 739
  • 5
  • 12
2
votes
1 answer

Pattern match input files for Amazon Elastic MapReduce

I am trying to run a MapReduce streaming job that takes input files from directories in an s3 bucket that match a given pattern. The pattern is something like bucket-name/[date]/product/logs/[hour]/[logfilename]. An example log would be in a while…
Evan
  • 2,983
  • 8
  • 31
  • 35