Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

vote

0 answers

how to send jars to emr hosts using aws java sdk?

Is there any other way to send my jar to EMR via the java sdk? Or are the only options to scp or to upload the jar to S3 and call it from there?

amazon-web-services elastic-map-reduce

asked Aug 16 '13 at 03:32

Julian

vote

1 answer

Running Map Reduce on a data set of around 10 GB on AWS

I want to store around 10 GB of data on AWS services and use map reduce to process the data. Is using EC2 the best option ? I want to use free tier service, it says maximum of 613 MB for free services on EC2 and that does not satisfy my…

amazon-web-services amazon-ec2 elastic-map-reduce

asked Aug 12 '13 at 08:26

user1524625

vote

1 answer

How to set heap size for EMR Master

I have a job which I am trigger from in EMR. The master triggers the mapper. Once it is done, it loads a heavweight operation in memory and then evenutualy will dump out. Right now, the job which runs on the cluster fails after a few minutes because…

elastic-map-reduce emr

asked Aug 06 '13 at 06:30

user2655578

vote

0 answers

How to execute shell commands in pig script on amazon Elastic Map Reduce?

By using bootstrap i was moving some source files to master node. While creating the jobflow through elastic-mapreduce-client, I will pass a pig script, that will launch embedded python from the source files that present in master node. following…

hadoop amazon-web-services amazon-s3 apache-pig elastic-map-reduce

asked Aug 02 '13 at 18:54

sasikkumar

vote

1 answer

Can't list the current job flow in Elastic map reduce Command line Tools?

I have installed the Amazon Elastic Map Reduce Command Line Tools successfully. While listing the current job flow, by using the below command $ ./elastic-mapreduce --list It throws the following Error. Error: Request has expired. Timestamp date:…

eclipse hadoop amazon-web-services mapreduce elastic-map-reduce

asked Jul 19 '13 at 09:54

Prabhu

vote

2 answers

Elastic Map Reduce Error

I am getting an error when using Elastic Map Reduce and I am not sure what it means because it is not very descriptive. I want to know specifically what kind of JSONDecodeError I am getting. "12" is not descriptive. This is the output. I am using…

python json elastic-map-reduce mrjob

asked Jul 16 '13 at 17:20

user1011332

vote

2 answers

jar containing org.apache.hadoop.hive.dynamodb

I was trying to programmatically Load a dynamodb table into HDFS (via java, and not hive), I couldnt find examples online on how to do it, so thought I'd download the jar containing org.apache.hadoop.hive.dynamodb and reverse engineer the…

mapreduce amazon-dynamodb elastic-map-reduce emr

asked Jun 13 '13 at 01:05

n915

vote

1 answer

Best way to split log files

Need help and this seems like such a common task to do: We have hourly huge logfiles containing many different events. We have been using hive to split these events to different files, in a hard coded way: from events insert overwrite table…

mapreduce hive bigdata elastic-map-reduce

asked May 14 '13 at 11:01

harelg

vote

1 answer

Run a custom MapReduce Jar in Amazon Elastic Map Reduce against data from Amazon DynamoDB

I have data in DynamoDB which I want to run mapreduce jobs against. I've found a lot of tutorials which involve using Hive to run SQL against the dynamoDB data in EMR, but for the task I'm trying to perform it will be very difficult to efficiently…

amazon-dynamodb elastic-map-reduce amazon-emr

asked May 07 '13 at 19:23

David Chanin

vote

2 answers

Sharing data between master and reduce

I need to perform aggregation using the results form all the reduce tasks. Basically the reduce task finds the sum and count and a value. I need to add all the sums and counts and find the final average. I tried using conf.setInt in reduce. But when…

mapreduce elastic-map-reduce

asked Feb 24 '13 at 02:20

user2103630

vote

2 answers

Output Folders for Amazon EMR

I want to jun a custom jar, whose main class a chain of map reduce jobs, with the output of the first job going as the input of the second jar, and so on. What do I set in FileOutputFormat.setOutputPath("what path should be here?"); If I specify…

hadoop amazon-web-services amazon-s3 elastic-map-reduce amazon-emr

asked Feb 09 '13 at 04:40

Mahalakshmi Lakshminarayanan

vote

1 answer

Can't pipe two hadoop commands?

I want to run the following command: hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input It works when I call it from the shell after I ssh onto the master node. But it will not…

hadoop ssh elastic-map-reduce

asked Feb 07 '13 at 11:43

Shane

2,315
3
21
33

vote

1 answer

Hadoop UniqValueCount Map and Aggregate Reducer for Large Dataset (1 billion records)

I have a data set that has approximately 1 billion data points. There are about 46 million unique data points I want to extract from this. I want to use Hadoop to extract the unique values, but keep getting "Out of Memory" and Java heap size errors…

hadoop mapreduce hadoop-streaming elastic-map-reduce

asked Jan 18 '13 at 17:23

Suman

9,221
5
49
62

vote

1 answer

Python: Increasing timeout value in EMR using yelps MRJOB

I am using the yelp MRjob for writing some of the mapreduce programs. I am running it on EMR. My program has reducer code which takes a long time to execute. I am noticing that because of the default timeout period in EMR I am getting this error…

python hadoop mapreduce elastic-map-reduce mrjob

asked Jan 17 '13 at 15:25

Read Q

1,405
2
14
26

vote

1 answer

Run a bootstrap action on an existing job flow

I have a job flow with keep-alive set, on which I want to run several bootstrap actions. One such action is a script that builds and installs Python 3.3. However the elastic-mapreduce CLI only allows for bootstrap actions to be run during job flow…

python hadoop bootstrapping elastic-map-reduce

asked Jan 17 '13 at 01:12

Matt Joiner

112,946
110
377
526

Prev 1 2 3

…

30 31 Next