Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

2 answers

How can I turn off hadoop speculative execution from Java

After reading Hadoop speculative task execution I am trying to turn off speculative execution using the new Java api, but it has no effect. This is my Main class: public class Main { public static void main(String[] args) throws Exception { …

java hadoop elastic-map-reduce speculative-execution

asked Apr 24 '14 at 10:21

Gavriel

18,880
12
68
105

votes

1 answer

FAILED: NullPointerException null in HIVE QUERY

Following is the HIVE query I am using, I am also using a Ranking function. I am running this on my local machine. SELECT numeric_id, location, Rank(location), followers_count FROM ( SELECT numeric_id, location, followers_count FROM…

hadoop mapreduce hive elastic-map-reduce hiveql

asked Apr 17 '14 at 04:18

patz

1,306
4
25
42

votes

2 answers

How to know job flow id, other cluster parameters in script running via script-runner.jar

I'm starting an elastic mapreduce cluster with the following command-line: $ elastic-mapreduce \ --create \ --num-instances "${INSTANCES}" \ --instance-type m1.medium \ --ami-version 3.0.4 \ --name "${CLUSTER_NAME}" \ --log-uri…

hadoop elastic-map-reduce

asked Apr 08 '14 at 10:39

Gavriel

18,880
12
68
105

votes

2 answers

cannot ssh into Elastic MapReduce

I'm using elastic-mapreduce to spun new clusters from the command line. After reading this tutorial, I have: elastic-mapreduce --create --alive \ --instance-type m1.xlarge\ --num-instances 5 \ --supported-product mapr \ --name m7 \ --args…

hadoop amazon-web-services ssh amazon-ec2 elastic-map-reduce

asked Jan 30 '14 at 02:01

cybertextron

10,547
28
104
208

votes

0 answers

Error: user not authorized to perform: iam:GetInstanceProfile

When trying to create "Interactive Cluster" using , ruby elastic-mapreduce --create --alive --name "Interactive Cluster" --num-instances=1 --master-instance-type=m1.large --hive-interactive I get the following message printed on the…

hadoop elastic-map-reduce

asked Dec 21 '13 at 18:08

user1523292

votes

2 answers

s3distcp error "Argument '--arg' doesn't match"

I'm trying to use s3distcp for an EMR job and got this exception: Exception in thread "main" java.lang.RuntimeException: Argument --arg doesn't match. at emr.hbase.options.Options.parseArguments(Options.java:75) at…

hadoop mapreduce elastic-map-reduce emr mrjob

asked Nov 03 '13 at 01:15

Thi Duong Nguyen

1,745
2
12
18

votes

1 answer

Modifying log4j.properties file on AWS Elastic MapReduce

I'm using AWS Elastic MapReduce and I would like to be able to set the logging level. For example, I would like for log.isDebugEnabled() to return true. A bit of googling led me to find this blog…

logging amazon-web-services elastic-map-reduce

asked Oct 09 '13 at 21:54

Alexander

1,673
4
19
25

votes

3 answers

Parse Freebase RDF dump with MapReduce

I downloaded the rdf data dump from Freebase and what I need to extract is the name of every entity in English in Freebase. Do I have to use Hadoop and MapReduce to do this, if so how? Or is there another way to extract the entity names? It would be…

hadoop mapreduce bigdata freebase elastic-map-reduce

asked Sep 16 '13 at 04:55

Django Johnson

1,383
3
21
40

votes

2 answers

How does one make a hadoop task attempt to fail after too many data fetch failures?

I have a hadoop reduce task attempt that will never fail or get completed unless I fail/kill it manually. The problem surfaces when the task tracker node (due to network issues that I am still investigating) looses connectivity with other task…

hadoop mapreduce elastic-map-reduce amazon-emr

asked Sep 12 '13 at 15:46

scetoaux

votes

1 answer

AWS Elastic mapreduce doesn't seem to be correctly converting the streaming to jar

I have a mapper and reducer that work fine when I run them in the piped version: cat data.csv | ./mapper.py | sort -k1,1 | ./reducer.py I used the elastic mapreducer wizard, loaded inputs, outputs, bootstrap, etc. The bootstrap is successful, but…

python hadoop amazon-web-services hadoop-streaming elastic-map-reduce

asked Sep 01 '13 at 07:34

Mittenchops

18,633
33
128
246

votes

1 answer

How do I specify a S3 bucket as my input to EMR

Instead of copying over to HDFS, is it possible to just get an array of objects in a bucket in S3 to be processed in EMR? I've tried this and I keep on either getting security warnings for not having credentials (even after I add them to the…

hadoop amazon-s3 elastic-map-reduce

asked Aug 13 '13 at 17:49

Julian

votes

0 answers

NoSuchMethodError with Netty on Amazon EMR

I am attempting to run a MapReduce job on spot instances using Amazon's EMR service. The intent is to read files off S3, process them in an MR job, and emit rows to a Cassandra DB in the reducer. My custom jar runs fine on a single-node Hadoop…

hadoop netty elastic-map-reduce nosuchmethoderror

asked Jul 09 '13 at 20:04

AlterForm

votes

1 answer

What happens if I add the same path twice to a Hadoop?

I am using elastic map reduce. I wonder what will happen if I use the exact same line twice in my main method. FileInputFormat.addInputPath(job, new Path( "s3n://mybucket/data/lolcat/*")); Will hadoop process the same…

hadoop elastic-map-reduce

asked May 22 '13 at 20:22

Eastern Monk

6,395
8
46
61

votes

4 answers

Mapreduce Table Diff

I have two versions (old/new) of a database table with about 100,000,000 records. They are in files: trx-old trx-new The structure is: id date amount memo 1 5/1 100 slacks 2 5/1 50 wine id is the simple primary key, other fields are…

sql hadoop mapreduce elastic-map-reduce cascading

asked May 04 '13 at 18:53

Bill Burcham

votes

1 answer

Pattern match input files for Amazon Elastic MapReduce

I am trying to run a MapReduce streaming job that takes input files from directories in an s3 bucket that match a given pattern. The pattern is something like bucket-name/[date]/product/logs/[hour]/[logfilename]. An example log would be in a while…

hadoop amazon-web-services elastic-map-reduce emr

asked May 02 '13 at 17:27

Evan

2,983
8
31
35

Prev 1 2 3

…

30 31 Next