Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

How to configure an Amazon EMR streaming job to use EC2 spot instances (Ruby CLI)?

When I create a streaming job with Amazon Elastic MapReduce (Amazon EMR), using the Ruby command line interface, how can I specify to use only EC2 spot instances (except for master)? The command below is working, but it "forces" me to use at lease 1…

amazon-ec2 amazon-web-services elastic-map-reduce amazon-emr

asked Feb 15 '12 at 10:06

Renaud

16,073
6
81
79

votes

1 answer

hadoop converting \r\n to \n and breaking ARC format

I am trying to parse data from commoncrawl.org using hadoop streaming. I set up a local hadoop to test my code, and have a simple Ruby mapper which uses a streaming ARCfile reader. When I invoke my code myself like cat 1262876244253_18.arc.gz |…

hadoop mapreduce elastic-map-reduce

asked Jan 25 '12 at 08:32

Ben Nagy

votes

1 answer

How can correct data types on Apache Pig be enforced?

I am having trouble SUMming a bag of values, due to a Data type error. When I load a csv file whose lines look like this: 6 574 false 10.1.72.23 2010-05-16 13:56:19 +0930 fbcdn.net static.ak.fbcdn.net 304 text/css 1 …

apache-pig elastic-map-reduce

asked Jan 12 '12 at 00:56

mindonaut

votes

4 answers

Using Amazon MapReduce/Hadoop for Image Processing

I have a project that requires me to process a lot (1000-10000) of big (100MB to 500MB) images. The processing I am doing can be done via Imagemagick, but I was hoping to actually do this processing on Amazon's Elastic MapReduce platform (which I…

bash hadoop amazon-web-services imagemagick elastic-map-reduce

asked Oct 19 '11 at 03:47

jzimmerman2011

1,806
2
24
37

votes

1 answer

boto ElasticMapReduce throttling and rate limiting

I've run into rate limting from Amazon EMR a few times via boto API with the following: boto.exception.EmrResponseError: EmrResponseError: 400 Bad Request …

amazon-ec2 throttling boto rate-limiting elastic-map-reduce

asked May 16 '11 at 17:46

poiuy

votes

1 answer

Specifying additional jars in AWS EMR custom jar application

I am trying to run a hadoop job on an EMR cluster. It is being run as a Java command for which I use a jar-with-dependencies. The job pulls data from Teradata and I am assuming Teradata related jars are also packed within the jar-with-dependencies.…

java mapreduce teradata classnotfoundexception elastic-map-reduce

asked Mar 09 '17 at 16:30

Nik

5,515
14
49
75

votes

1 answer

How can I remove files from /usr/lib/hadoop/lib before running an EMR job on AMI 4.x?

I have a Hadoop job which uses version 1.5 of the commons-codec library. In order to make this job run on EMR AMI 3.x, I had to create a bootstrap action which deleted all earlier versions of the jar from the cluster to prevent them from being…

java amazon-web-services emr elastic-map-reduce

asked Nov 24 '15 at 10:16

fblundun

votes

1 answer

Error: undefined method "each" for String when running elastic-mapreduce specifying distributed cache file

I've got the following error: Error: undefined method `each' for "s3n://dico-count-words/Cache/dicoClazz.p#dicoClazzCache.p":String When I run the following command line to launch a mapreduce algorithm on Amazon EMR cluster via elastic-mapreduce,…

ruby amazon-web-services elastic-map-reduce

asked Nov 11 '14 at 01:03

Garnieje

votes

5 answers

how to run/install oozie in EMR cluster

I want to orchestrate my EMR jobs. so I thought oozie will be good fit. I have done some POCs on oozie workflow but in local mode, its fairly simple and great. But I dont understand how to use oozie on EMR cluster. Based on some search I got to know…

amazon-web-services elastic-map-reduce oozie emr

asked Mar 12 '14 at 04:57

sunil

1,259
1
14
27

votes

3 answers

Writing to a file in S3 from jar on EMR on AWS

Is there any way in which I can write to a file from my Java jar to an S3 folder where my reduce files would be written ? I have tried something like: FileSystem fs = FileSystem.get(conf); FSDataOutputStream FS = fs.create(new Path("S3…

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked Feb 13 '14 at 17:28

hitrix

votes

1 answer

elastic map reduce timing out java.io.IOException: Unexpected end of stream

I am running MAP reduce job (Elastic map reduce EMR ) service.The job works fine for small data set but gives following exceptions for large data set (File size 400MB) Running another job with same big input file works fine but.Why so? Error:…

java hadoop elastic-map-reduce

asked Jan 30 '14 at 11:59

user93796

18,749
31
94
150

votes

1 answer

s3distcp srcPattern not working?

I have files like this in S3: 1-2013-08-22-22-something 2-2013-08-22-22-something etc without srcPattern I can get all of the files from the bucket easily but I want to get a specific prefix, for example all of the 1's. I've tried using srcPattern…

hadoop amazon-s3 elastic-map-reduce

asked Aug 24 '13 at 20:43

Julian

votes

1 answer

ElasticMapReduce: Specified Availability Zone is not supported

I tried to use EMR in Oregon region so I used "us-west-2" as availability zone in run_job_flow and I got the following error: Error response for action RunJobFlow: Sender/ValidationError; Specified Availability Zone is not supported

amazon-web-services elastic-map-reduce

asked Aug 24 '13 at 01:24

kee

10,969
24
107
168

votes

1 answer

Specifying other user owned S3 buckets in EMR job flows

I am trying to use an S3 bucket as input data for my Elastic Map Reduce job flow. The S3 bucket does not belong to the same account as the EMR job flow. How and where should I specify the S3 bucket credentials to access the respective S3 bucket. I…

amazon-web-services amazon-s3 elastic-map-reduce amazon-emr

asked Aug 23 '13 at 09:44

Abhishek Jain

4,478
8
34
51

votes

4 answers

Amazon Elastic Map Reduce : Listing job flows in command line tools Issue?

I'm new to Amazon web services, I'm trying to run job flows on Amazon elastic map reduce jobs using command line interface tools. I followed the steps from amazon developer guide of this developer guide from aws.But things are not getting clear to…

hadoop amazon-web-services cloudera elastic-map-reduce ganglia

asked Jul 21 '13 at 13:10

Prabhu

Prev 1 2 3

…

30 31 Next