Questions tagged [emr]

Questions relating to Amazon's Elastic MapReduce (EMR) product.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : elastic-map-reduce amazon-emr

1166 questions

votes

1 answer

Error in executing Customised WordCount jar in AWS EMR

Hi I am trying to execute customised WordCount jar on AWs EMR. My word count jar is working properly because I tried adding it as a step without job arguments and it is running successfully. My problem is when I run it with job arguments. In my s3 I…

amazon-web-services elastic-map-reduce amazon-emr emr

asked Nov 24 '13 at 22:30

sa_nyc

votes

2 answers

How should data files be included to mrjob on EMR?

I am trying to run a mrjob on Amazon's EMR. I've tested the job locally using the inline runner, but it fails when running on Amazon. I've narrowed the failure down to my dependence on an external data file zip_codes.txt. If I run without that…

python mapreduce amazon-emr emr mrjob

asked Sep 24 '13 at 00:40

fixedpoint

1,575
1
17
24

votes

1 answer

Script to unpack python 2.7 at bootstrap on amazon EMR node

I've got python scripts that require version 2.7. Installing python 2.7 at bootstrap time on EMR using a bash script is easy enough but is taking too long. AWS support suggested I compile Python 2.7 locally, tar the installation and unpack it at…

python python-2.7 emr

asked Sep 23 '13 at 18:37

user2808321

votes

1 answer

How to change an emr job configuration using c# awssdk api

I want the output for my reducer to be zipped (preferably gzip). I am successfully able to launch an EMR job using the c# awssdk but do not know how to change the job confiugration for desired result. I understand i need to set the following…

emr

asked Sep 12 '13 at 21:35

user2330278

votes

1 answer

Bootstrap action for EMR

While bootstapping on AWS EMR - I am getting the following. Any clues how to resolve it? /mnt/var/lib/bootstrap-actions/1/STAR: /lib/libc.so.6: version 'GLIBC_2.14' not found (required by /mnt/var/lib/bootstrap-actions/1/STAR)

emr

asked Sep 06 '13 at 18:11

Huzefa Mehta

votes

2 answers

EMR custom logging from mapper and reducer

Is it possible to have custom logs from mappers and reducers in EMR.... lets say I have a mapper which goes thru data and filters based on certain conditions Mapper code (streaming) Look at input line If useragent is bad - LOG into a custom…

logging emr

asked Aug 21 '13 at 17:48

user2330278

votes

1 answer

elastic map reduce "keep alive" specification in the java api

How do I set the jobflow to "keep alive" in the java api like I do with command like like this: elastic-mapreduce --create --alive ... I have tried to add withKeepJobFlowAlivewhenNoSteps(true) but this still makes the jobflow shut down when a step…

elastic-map-reduce emr

asked Aug 16 '13 at 22:31

Julian

votes

0 answers

MultiThreadedMapper refuses to find Jar

For some reason everytime I run this program (both on eclipse and on EMR) I get the message 13/07/18 13:22:23 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). A few print…

java multithreading hadoop emr

asked Jul 18 '13 at 17:26

Chenab

votes

0 answers

Am I spawning more threads then I think I am in my mapper?

I'm attempting to make a web parser using and since by nature there is downtime while the program retrieves the document from I made it multithreaded. The idea being that my Threads retrieve the URLS from a url pile. This tripled the speed of the…

java hadoop emr

asked Jul 17 '13 at 18:50

Chenab

votes

1 answer

How can I read and write binary files in Cascading?

I want to load some files in binary format (for example jpegs, but could be any binary format), manipulate it somehow and write it back. I want to do that on hadoop, and I would like to write it over Cascading framework. Are there binary sinks /…

hadoop elastic-map-reduce emr cascading

asked Jul 17 '13 at 12:52

polo

1,352
2
16
35

votes

1 answer

pig aws emr jython serialization error

I am trying to run a trivial Python UDF in Pig on Amazon EMR and it throws a java serialization error: java.io.IOException: Deserialization error: could not instantiate 'org.apache.pig.scripting.jython.JythonFunction' with arguments…

jython apache-pig emr

asked Jun 03 '13 at 00:04

n2ygk

votes

1 answer

Slow Hive Query Performance under AWS Elastic MapReduce

There's a strange problem I'm experiencing, and I assure you I've googled a lot. I'm running a set of AWS Elastic MapReduce Clusters, and I have a Hive Table with about 16 partitions. They're created from emr-s3distcp (since there are about 216K…

hadoop hive hdfs elastic-map-reduce emr

asked May 12 '13 at 10:10

aldrinleal

3,559
26
33

votes

1 answer

Data set join using EMR

I have 2 tab-delimited datasets stored in AWS S3. I am trying to write an EMR job that will join these 2 datasets based on a common key (a set of field values). My current version populates 2 lists and compares them line by line; outputting the rows…

join hadoop amazon-web-services emr

asked May 06 '13 at 18:44

Zihs

votes

1 answer

Splitting a file using Map Reduce

I would like to split the content of a text file into 2 different files using EMR. The input file, as well as the mapper and reducer scripts are all stored in AWS' S3. Currently, my mapper reformats the inputs of stdin by tab-delimiting each field…

python amazon-web-services amazon-s3 boto emr

asked Apr 23 '13 at 21:28

Zihs

votes

1 answer

How to merge the small files on S3 generated by EMR with thousands of reducers

My cascalog EMR job generated thousands of small files on S3 buckets. It generate the same number of files as the number of reducers I used. Dumping all these tiny files take minutes. I wonder if there is a way to concat them on S3 so that I can…

hadoop amazon-web-services amazon-s3 emr cascalog

asked Apr 06 '13 at 15:10

rninja

Prev 1 2 3

…

77 78 Next