Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Hive performance on Amazon DynamoDB

I'm using Amazon DynamoDB to collect statistics and ElasticMapReduce with Hive to process statistics and upload results to S3. On DynamoDB I have table prod_product_views: - id (Hash key) - product_id (Range key) - company_id - creted - price …

amazon-web-services hive amazon-dynamodb elastic-map-reduce

asked May 23 '13 at 08:52

trkich

votes

2 answers

A join operation using Hadoop MapReduce

How to take a join of two record sets using Map Reduce ? Most of the solutions including those posted on SO suggest that I emit the records based on common key and in the reducer add them to say a HashMap and then take a cross product. (eg. Join of…

hadoop mapreduce elastic-map-reduce

asked May 19 '13 at 09:36

Eastern Monk

6,395
8
46
61

votes

1 answer

Slow Hive Query Performance under AWS Elastic MapReduce

There's a strange problem I'm experiencing, and I assure you I've googled a lot. I'm running a set of AWS Elastic MapReduce Clusters, and I have a Hive Table with about 16 partitions. They're created from emr-s3distcp (since there are about 216K…

hadoop hive hdfs elastic-map-reduce emr

asked May 12 '13 at 10:10

aldrinleal

3,559
26
33

votes

0 answers

Pig Join is returning no results

I have been stuck on this problem for over twelve hours now. I have a Pig script that is running on Amazon Web Services. Currently, I am just running my script in interactive mode. I am trying to get averages on a large data set of climate readings…

hadoop amazon-web-services nosql apache-pig elastic-map-reduce

asked May 03 '13 at 01:02

user2345171

votes

1 answer

Hadoop taking forever on EMR and profiling EMR

I am running a sample hadoop job over ~500 documents on S3, and when ran locally it takes <15min to complete. However, when I tried running the same job on EMR, it takes over 2 hours and still didn't complete the reduction step, so I terminated it.…

java hadoop amazon-web-services mapreduce elastic-map-reduce

asked May 01 '13 at 22:15

Jin

6,055
2
39
72

votes

1 answer

Splitting responsibilities of mappers on Elastic MapReduce (MySQL + MongoDB input)

I want to make sure I understand EMR correctly. I'm wondering - does what I'm talking about make any sense with EMR / Hadoop? I currently have a recommendation engine on my app that examines data stored in both MySQL and MongoDB (both on separate…

hadoop mapreduce hadoop-streaming elastic-map-reduce

asked Apr 29 '13 at 16:58

nlyn

votes

1 answer

how to configure a custom amazon EMR bootstrap action in code

I am trying to configure a bootstrap action in code. I am able to successfully run my job with Bootstrap action using the UI in amazon so I know my bootstrap action is working. Also without the bootstrap action I am able to successfully invoke my…

elastic-map-reduce

asked Apr 28 '13 at 23:09

user2330278

votes

1 answer

Adding extra arguements to HadoopJarStepConfig fails

I am trying to get this command via the AWS SDK: hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input hdfs:///logs/ -output hdfs:///no_dups -mapper dedup_mapper.py -reducer dedup_reducer.py -file deduplication.py dedup_mapper.py…

amazon-web-services hadoop-streaming elastic-map-reduce

asked Apr 26 '13 at 16:55

Shane

2,315
3
21
33

votes

1 answer

Using other files along with EMR streaming step?

I currently have a hadoop command that I would like to copy using the AWS SDK. The command I'm currently using hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input /no_dups -output /sorted -mapper mapper.py -reducer reducer.py -file…

hadoop amazon-web-services elastic-map-reduce

asked Apr 25 '13 at 16:32

Shane

2,315
3
21
33

votes

0 answers

How do I save a file as .pig using windows?

I just tried running a Elastic MapReduce job using a Pig script. I created the Pig script in Notepad, saved it originally as a .txt file, then manually changed the extension to .pig and uploaded. Here's the error I got: Run Pig Script FAILED …

amazon-ec2 apache-pig elastic-map-reduce

asked Apr 24 '13 at 14:59

user1956609

2,132
5
27
43

votes

1 answer

Hadoop Custom Input Format that doesn't use files

I'm just getting started on Hadoop and I'm struggling to figure out how to use other input sources that aren't files, i.e. Read all the rows from AWS SimpleDB, or all records from a REST API on another system. Everything online only shows how to…

java hadoop amazon-simpledb elastic-map-reduce

asked Apr 23 '13 at 15:26

dgildeh

votes

2 answers

Pig group by and average function

I have data that looks like this STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN PRCP SNDP FRSHTT 030050 99999 19291029 46.7 4 42.0 4 990.9 4 9999.9 0 10.9 4 …

hadoop amazon-web-services apache-pig elastic-map-reduce

asked Apr 20 '13 at 01:10

Casey Hancock

votes

1 answer

HIVE: How can I pass a hiveconf that contains a single quote?

I would like to pass a hive arg that contains a single quote in a string. This causes the EMR Job to fail with the following error: sh: -c: line 0: unexpected EOF while looking for matching `'' sh: -c: line 1: syntax error: unexpected end of…

string hadoop amazon-web-services hive elastic-map-reduce

asked Apr 17 '13 at 20:19

user922295

votes

1 answer

Is using map-reduce necessary

when doing a cloud computing project is it necessary to use amazon s3 as defined in:{http://www.ibm.com/developerworks/aix/library/au-cloud_apache/#figure2} in figure 1, or I can just use a map-reduce and a database? Thanks in advance.

mapreduce cloud elastic-map-reduce

asked Apr 17 '13 at 20:00

Zahra Namini Mianji

votes

1 answer

Where should I write mapreduce program

Where should I write map-reduce programs - in text file or anything else ? What is the file format to save file containing Map-reduce program? e.g. In java, text file, having java code saved as filename.java but what will be that for map-reduce…

mapreduce elastic-map-reduce

asked Mar 22 '13 at 17:45

user2200278

Prev 1 2 3

…

30 31 Next