Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

2 answers

Amazon Elastic MapReduce Bootstrap Actions not working

I have tried the following combinations of bootstrap actions to increase the heap size of my job but none of them seem to work: --mapred-key-value mapred.child.java.opts=-Xmx1024m --mapred-key-value mapred.child.ulimit=unlimited --mapred-key-value…

hadoop amazon-web-services mapreduce elastic-map-reduce amazon-emr

asked Apr 05 '12 at 07:38

Shrish Bajpai

votes

2 answers

How to parse freebase quad dump using Amazon mapreduce

Im trying to extract movie informations from freebase, i just need name of the movie, name and id of the director and of the actors. I found it hard to do so using freebases topic dumps, because there is no reference to the director ID, just…

mapreduce freebase elastic-map-reduce

asked Mar 07 '12 at 14:11

Jaroušek Puchlivec

votes

0 answers

OutOfMemory error when running full-scale hadoop job

I'm running a hadoop job on Amazon Elastic MapReduce and I keep getting an OutOfMemory error. The values are admittedly a little bit larger than most MapReduce values, but it seems even when I decrease the size dramatically it still happens. Here's…

hadoop out-of-memory elastic-map-reduce

asked Jan 23 '12 at 18:23

dspyz

5,280
2
25
63

votes

1 answer

getting data out of hive and into mysql @ AWS?

I'd love to use Sqoop but don't think it is worth running the Cloudera stack @ AWS over ElasticMapReduce (which I really like) just for this. My current thought is just to write the data I need moved to an external table housed @ S3 and then write…

mysql hadoop amazon-web-services hive elastic-map-reduce

asked Nov 23 '11 at 14:58

Tom Emmons

votes

1 answer

EC2 Job Flow Failure

I have a jar file MapReduce that I'd like to run on s3. It takes two args, an input dir and an output file. So I tried the following command using the elastic-mapreduce ruby cmd line tool: elastic-mapreduce -j j-JOBFLOW --jar…

amazon-ec2 elastic-map-reduce

asked Oct 18 '11 at 16:50

user592419

5,103
9
42
67

votes

1 answer

Has anybody created a job with multiple inputs using the the ruby client for Amazon's Elastic Map Reduce?

Through the UI Amazon's framework allows me to create jobs with multiple inputs by specifying multiple --input lines. e.g.: -input s3n://something -input s3n://something-else Similarly the Ruby EMR client has been very helpful to me so…

amazon-web-services amazon-emr elastic-map-reduce

asked Sep 02 '11 at 01:22

henry

1,716
3
15
27

votes

1 answer

Streaming Command Failed! error when using Elastic Map Reduce/S3 and R

I'm following this example here hoping to successfully run something using EC2/S3/EMR/R. https://gist.github.com/406824 The job fails on the Streaming Step. Here are the error logs: controller: 2011-07-21T19:14:27.711Z INFO Fetching jar…

r amazon-s3 amazon-ec2 hadoop elastic-map-reduce

asked Jul 21 '11 at 20:20

tcash21

4,880
4
32
39

votes

1 answer

Amazon Elastic MapReduce - Format or Examples for python map and reduce code

Maybe it is the same has Hadoop but I just couldn't find what is the format or example of writing the map and reduce python code beside map example here: http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/ but I couldn't…

python hadoop mapreduce amazon-emr elastic-map-reduce

asked Jun 29 '11 at 20:01

Alon Gutman

votes

1 answer

How is input data distributed across nodes for EMR [using MRJob]?

I'm looking into using Yelp's MRJob to compute using Amazon's Elastic Map Reduce. I will need to read and write a large amount of data during the computationally intensive job. Each node should only get a part of the data, and I'm confused about…

mongodb amazon-web-services partitioning elastic-map-reduce mrjob

asked Feb 21 '11 at 17:36

trope

votes

2 answers

Spark possible race condition in driver

I have a Spark job that processes several folders on S3 per run and stores its state on DynamoDB. In other words, we're running the job once per day, it looks for new folders added by another job, transforms them one-by-one and writes state to…

java multithreading amazon-web-services apache-spark elastic-map-reduce

asked Oct 31 '17 at 20:32

chuwy

6,310
4
20
29

votes

1 answer

IllegalAccessError when running spark job in EMR

I am attempting to run a spark job that accesses dynamodb and the old way of instantiating a dynamoDb client has been deprecated and it is now recommended to use the client builder. Well, this works fine locally, but when I deploy to EMR i'm…

scala amazon-web-services apache-spark amazon-emr elastic-map-reduce

asked Apr 05 '17 at 17:24

Leyth G

1,103
2
15
38

votes

1 answer

Unable to read sequence file from distributed cache in EMR

I am trying to sequence file from distributed cache in EMR but its unable to read the file from distributed cache in EMR. My code works fine in local but its giving me issue on emr. Here is my code snippet- Putting sequence file to distributed…

amazon-web-services mapreduce amazon-emr elastic-map-reduce distributed-cache

asked Apr 02 '17 at 07:14

Y0gesh Gupta

2,184
5
40
56

votes

1 answer

Hadoop process WARC files

I have a general question about Hadoop file splitting and multiple mappers. I am new to Hadoop and am trying to get a handle on how to setup for optimal performance. My project is currently processing WARC files which are GZIPed. Using the current…

java hadoop mapreduce elastic-map-reduce common-crawl

asked Oct 30 '16 at 05:22

user1738628

votes

1 answer

How to create an EMR cluster using AWS SDK for Go

I want to create EMR clusters using AWS SDK for Go, but I can't find a way in the official documentation. Package: emr — AWS SDK for Go Cound you please help me with a detailed code?

amazon-web-services go elastic-map-reduce

asked Mar 24 '16 at 09:30

NSR

votes

0 answers

What is causing "org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null"?

I have an Elastic MapReduce job which uses elasticsearch-hadoop via scalding-taps to transfer data from Amazon S3 to Amazon Elasticsearch Service. For a long time this job ran successfully. However, it has recently started failing with the following…

hadoop elasticsearch elastic-map-reduce cascading scalding

asked Mar 02 '16 at 10:29

fblundun

Prev 1 2 3

…

30 31 Next