Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
0
votes
1 answer

Hive performance on Amazon DynamoDB

I'm using Amazon DynamoDB to collect statistics and ElasticMapReduce with Hive to process statistics and upload results to S3. On DynamoDB I have table prod_product_views: - id (Hash key) - product_id (Range key) - company_id - creted - price …
0
votes
2 answers

A join operation using Hadoop MapReduce

How to take a join of two record sets using Map Reduce ? Most of the solutions including those posted on SO suggest that I emit the records based on common key and in the reducer add them to say a HashMap and then take a cross product. (eg. Join of…
Eastern Monk
  • 6,395
  • 8
  • 46
  • 61
0
votes
1 answer

Slow Hive Query Performance under AWS Elastic MapReduce

There's a strange problem I'm experiencing, and I assure you I've googled a lot. I'm running a set of AWS Elastic MapReduce Clusters, and I have a Hive Table with about 16 partitions. They're created from emr-s3distcp (since there are about 216K…
aldrinleal
  • 3,559
  • 26
  • 33
0
votes
0 answers

Pig Join is returning no results

I have been stuck on this problem for over twelve hours now. I have a Pig script that is running on Amazon Web Services. Currently, I am just running my script in interactive mode. I am trying to get averages on a large data set of climate readings…
0
votes
1 answer

Hadoop taking forever on EMR and profiling EMR

I am running a sample hadoop job over ~500 documents on S3, and when ran locally it takes <15min to complete. However, when I tried running the same job on EMR, it takes over 2 hours and still didn't complete the reduction step, so I terminated it.…
Jin
  • 6,055
  • 2
  • 39
  • 72
0
votes
1 answer

Splitting responsibilities of mappers on Elastic MapReduce (MySQL + MongoDB input)

I want to make sure I understand EMR correctly. I'm wondering - does what I'm talking about make any sense with EMR / Hadoop? I currently have a recommendation engine on my app that examines data stored in both MySQL and MongoDB (both on separate…
nlyn
  • 606
  • 1
  • 7
  • 20
0
votes
1 answer

how to configure a custom amazon EMR bootstrap action in code

I am trying to configure a bootstrap action in code. I am able to successfully run my job with Bootstrap action using the UI in amazon so I know my bootstrap action is working. Also without the bootstrap action I am able to successfully invoke my…
user2330278
  • 67
  • 10
0
votes
1 answer

Adding extra arguements to HadoopJarStepConfig fails

I am trying to get this command via the AWS SDK: hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input hdfs:///logs/ -output hdfs:///no_dups -mapper dedup_mapper.py -reducer dedup_reducer.py -file deduplication.py dedup_mapper.py…
Shane
  • 2,315
  • 3
  • 21
  • 33
0
votes
1 answer

Using other files along with EMR streaming step?

I currently have a hadoop command that I would like to copy using the AWS SDK. The command I'm currently using hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input /no_dups -output /sorted -mapper mapper.py -reducer reducer.py -file…
Shane
  • 2,315
  • 3
  • 21
  • 33
0
votes
0 answers

How do I save a file as .pig using windows?

I just tried running a Elastic MapReduce job using a Pig script. I created the Pig script in Notepad, saved it originally as a .txt file, then manually changed the extension to .pig and uploaded. Here's the error I got: Run Pig Script FAILED …
user1956609
  • 2,132
  • 5
  • 27
  • 43
0
votes
1 answer

Hadoop Custom Input Format that doesn't use files

I'm just getting started on Hadoop and I'm struggling to figure out how to use other input sources that aren't files, i.e. Read all the rows from AWS SimpleDB, or all records from a REST API on another system. Everything online only shows how to…
dgildeh
  • 175
  • 9
0
votes
2 answers

Pig group by and average function

I have data that looks like this STN--- WBAN YEARMODA TEMP DEWP SLP STP VISIB WDSP MXSPD GUST MAX MIN PRCP SNDP FRSHTT 030050 99999 19291029 46.7 4 42.0 4 990.9 4 9999.9 0 10.9 4 …
0
votes
1 answer

HIVE: How can I pass a hiveconf that contains a single quote?

I would like to pass a hive arg that contains a single quote in a string. This causes the EMR Job to fail with the following error: sh: -c: line 0: unexpected EOF while looking for matching `'' sh: -c: line 1: syntax error: unexpected end of…
0
votes
1 answer

Is using map-reduce necessary

when doing a cloud computing project is it necessary to use amazon s3 as defined in:{http://www.ibm.com/developerworks/aix/library/au-cloud_apache/#figure2} in figure 1, or I can just use a map-reduce and a database? Thanks in advance.
0
votes
1 answer

Where should I write mapreduce program

Where should I write map-reduce programs - in text file or anything else ? What is the file format to save file containing Map-reduce program? e.g. In java, text file, having java code saved as filename.java but what will be that for map-reduce…
user2200278
  • 95
  • 1
  • 4
  • 10