Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

vote

2 answers

Problems using distcp and s3distcp with my EMR job that outputs to HDFS

I've run a job on AWS's EMR, and stored the output in the EMR job's HDFS. I am then trying to copy the result to S3 via distcp or s3distcp, but both are failing as described below. (Note: the reason I'm not just sending my EMR job's output directly…

amazon-web-services elastic-map-reduce amazon-emr emr

asked Jun 24 '12 at 21:21

Dolan Antenucci

15,432
17
74
100

vote

1 answer

Where is my AWS EMR reducer output for my completed job (should be on S3, but nothing there)?

I'm having an issue where my Hadoop job on AWS's EMR is not being saved to S3. When I run the job on a smaller sample, the job stores the output just fine. When I run the same command but on my full dataset, the job completes again, but there is…

amazon-s3 amazon-web-services elastic-map-reduce missing-data

asked Jun 23 '12 at 13:01

Dolan Antenucci

15,432
17
74
100

vote

3 answers

Write some data (lines) from my mappers to separate directories depending on some logic in my mapper code

I am using mrjob for my EMR needs. How do I write some data (lines) from my mappers to "separate directories" depending on some logic in my mapper code that I can: tar gzip and upload to separate S3 buckets (depending on the directory name) after…

hadoop elastic-map-reduce mrjob

asked Jun 18 '12 at 21:59

newToFlume

vote

3 answers

How do I make sure RegexSerDe is available to my Hadoop nodes?

I'm trying to attack the problem of analyzing web logs with Hive, and I've seen plenty of examples out there, but I can't seem to find anyone with this specific issue. Here's where I'm at: I've set up an AWS ElasticMapReduce cluster, I can log in,…

hadoop hive classnotfoundexception elastic-map-reduce

asked Jun 13 '12 at 14:26

awshepard

2,627
1
19
24

vote

2 answers

Creating a large covariance matrix

I need to create ~110 covariance matrices of doubles size 19347 x 19347 then add them all together. This in itself isn't very difficult and for smaller matrices the following code works fine. covmat <- matrix(0, ncol=19347,…

r mapreduce elastic-map-reduce

asked Jun 10 '12 at 14:14

TrueWheel

vote

3 answers

How to use external data with Elastic MapReduce

From Amazon's EMR FAQ: Q: Can I load my data from the internet or somewhere other than Amazon S3? Yes. Your Hadoop application can load the data from anywhere on the internet or from other AWS services. Note that if you load data from the internet,…

elastic-map-reduce

asked Jun 06 '12 at 16:41

Sandeep Parikh

vote

1 answer

Need suggestion on using Map/Reduce to create solr index

I'm pretty new to Map/Reduce world and trying to evaluate the best option to figure if I can leverage it to create index in Solr. Currently, I'm using a regular crawl to fetch data and index it in Solr directly. This is working without any issues.…

solr amazon-s3 mapreduce elastic-map-reduce emr

asked May 18 '12 at 03:40

Shamik

1,671
11
36
64

vote

2 answers

Ganglia and Amazon Elastic Map Reduce - install issues

Following the instructions for "Initializing Ganglia on a Job Flow" I get my cluster up but don't see any Ganglia process running (on 8157). …

elastic-map-reduce ganglia

asked Apr 11 '12 at 18:53

Tom Emmons

votes

1 answer

Setting jobconf parameters with Karmasphere Analyst & Amazon Elastic MapReduce

The Karmasphere Analyst profiler has suggested that I set some jobconf parameters, for example, mapred.map.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec. But I don't know where to set these and I can't find it in the Karmasphere…

amazon-web-services elastic-map-reduce

asked Jan 11 '12 at 04:02

Vinay

votes

2 answers

Can't get --supported-products option to work with Amazon's elastic-mapreduce Ruby client for Karmasphere Analytics

I am trying to use Karmaspere Analytics with AWS. This page says to use --supported-products with the ruby client. However, when I run the command (exactly as entered on that page), I get an error "Error: invalid option: --supported-products" I am…

amazon-web-services elastic-map-reduce

asked Nov 26 '11 at 22:58

Vinay

votes

1 answer

Force one reducer in AWS EMR

How do I ensure that there's only reducer for my EMR Streaming job? Is there any way to do this from the web frontend when I'm creating a new Jobflow?

amazon-web-services elastic-map-reduce

asked Nov 15 '11 at 20:30

jetru

1,964
4
16
24

votes

1 answer

Starting jobs with direct calls to Hadoop from within SSH

I've been able to kick off job flows using the elastic-mapreduce ruby library just fine. Now I have an instance which is still 'alive' after it's jobs have finished. I've logged in to is using SSH and would like to start another job, but each of my…

hadoop amazon-web-services elastic-map-reduce

asked Oct 04 '11 at 23:07

Trindaz

17,029
21
82
111

votes

1 answer

What's the best way to do set-membership tests in hadoop?

I'm using hadoop to process a sequence of analytics records for my application. I want to categorise users based on which events I see in their stream and then use that information in a later stage when iterating over the stream again. For…

java hadoop amazon-web-services elastic-map-reduce

asked Sep 16 '11 at 23:43

Fasaxc

votes

1 answer

java.lang.RuntimeException: java.lang.ClassNotFoundException when trying to run Jar job on Elastic MapReduce

What should I change to fix following error: I'm trying to start a job on Elastic Mapreduce, and it crashes every time with message: java.lang.RuntimeException: java.lang.ClassNotFoundException: iataho.mapreduce.NewMaxTemperatureMapper at…

java hadoop mapreduce amazon-emr elastic-map-reduce

asked Sep 10 '11 at 18:15

Arsen Zahray

24,367
48
131
224

votes

1 answer

Spark: Reporting Total, and Available Memory of the Cluster

I'm running a Spark job on an Amazon EMR; I would like to keep reporting the total, and free memory of the cluster from within the program itself. Is there any method in Spark API which provides information about the cluster's memory?

scala apache-spark cluster-computing amazon-emr elastic-map-reduce

asked Jun 06 '18 at 15:04

user1888243

2,591
9
32
44

Prev 1 2 3

…

30 31 Next