Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Hadoop DistributedCache object changed during job

I'm trying to run KMeans on AWS, and I ran into the following exception when trying to read updated cluster centroids from the DistributedCache: java.io.IOException: The distributed cache object s3://mybucket/centroids_6/part-r-00009 changed during…

java hadoop amazon-web-services mapreduce elastic-map-reduce

asked Apr 08 '13 at 17:56

Magsol

4,640
11
46
68

votes

0 answers

Scalability issues with templatetap

I wrote a cascading 1.2 program that does the following processing from data of a sensor network: Read CSV files having 3 columns: millisecond timestamp, event type (either of sensor data, battery level, sensor power state), event body Round up the…

templates hadoop elastic-map-reduce cascading

asked Jan 18 '13 at 19:59

newToFlume

votes

1 answer

Write 100 million files to s3

My main aim is to split out records into files according to the ids of each record, and there are over 15 billion records right now which can certainly increase. I need a scalable solution using Amazon EMR. I have already got this done for a smaller…

hadoop amazon-s3 elastic-map-reduce amazon-emr emr

asked Dec 29 '12 at 11:16

Amar

11,930
5
50
73

votes

1 answer

EMR - Leverage using spot instances

I know that we can bid on spot instances and get them at lower prices than that of regular instances, but with spot instances there is the risk of your instances being taken back. I want to know that is there any way we can ensure that they are…

amazon-ec2 mapreduce elastic-map-reduce emr

asked Dec 23 '12 at 21:13

user1804287

votes

4 answers

Best way to have a fast access key-value storage for huge dataset (5 GB)

There is a dataset of ~5GB in size. This big dataset just has a key-value pair per line. Now this needs to be read for the value of keys some billion times. I have already tried disk based approach of MapDB, but it throws ConcurrentModification…

java hadoop mapreduce elastic-map-reduce emr

asked Dec 04 '12 at 15:34

Amar

11,930
5
50
73

votes

1 answer

Error while connecting Elastic Map Reduce ruby client

I am following the steps mentioned on the AWS to use an interactive Hive session using SSH. I used the following resources…

hadoop amazon-s3 amazon-web-services elastic-map-reduce

asked Dec 01 '12 at 13:42

asquare

votes

1 answer

How to decide on number of parallel mapers/reducers along with Heap memory?

Say I have a EMR job running on 11 node cluster: m1.small master node while 10 m1.xlarge slave nodes. Now one m1.xlarge node has 15 GB of RAM. How to then decide on the number of parallel mappers and reducers which can be set? My jobs are memory…

hadoop mapreduce elastic-map-reduce emr

asked Nov 06 '12 at 23:23

Amar

11,930
5
50
73

votes

1 answer

How Can I Automate Running Pig Batch Jobs on Elastic MapReduce without Amazon GUI?

I have some pig batch jobs in .pig files I'd love to automatically run on EMR once every hour or so. I found a tutorial for doing that here, but that requires using Amazon's GUI for every job I setup, which I'd really rather avoid. Is there a good…

apache-pig elastic-map-reduce

asked Oct 20 '12 at 01:05

Eli

36,793
40
144
207

votes

1 answer

Why does the Amazon .Net SDK not see any job flows?

My company has grown weary of constantly using the AWS console to setup our map reduce clusters and needs more configurability than the console provides. I'm using the .Net AWS SDK to write a simple application that allows us to create and control…

c# .net amazon-web-services elastic-map-reduce

asked Sep 26 '12 at 00:28

Chris Phillips

11,607
3
34
45

votes

1 answer

importing compressed (lzo) data from s3 to hive

I export my DynamoDB tables to s3 as a means of backup (via EMR). When I export, I store the data as lzo compressed file. My hive query is below, but essentially I followed the "To export an Amazon DynamoDB table to an Amazon S3 bucket using data…

amazon-web-services hive elastic-map-reduce emr lzo

asked Aug 10 '12 at 17:01

rynop

50,086
26
101
112

votes

3 answers

Amazon Elastic MapReduce: Output directory

I'm running through Amazon's example of running Elastic MapReduce and keep getting hit with the following error: Error launching job , Output path already exists. Here is the command to run the job that I am…

hadoop amazon-ec2 amazon-web-services elastic-map-reduce

asked Jul 29 '12 at 23:51

Mark Peters

votes

1 answer

Join performance on AWS elastic map reduce running hive

I am running a simple join query select count(*) from t1 join t2 on t1.sno=t2.sno Table t1 and t2 both have 20 million records each and column sno is of string data type. The table data is imported in to HDFS from Amazon s3 in rcfile format. The…

amazon-ec2 hive hdfs elastic-map-reduce

asked Jun 27 '12 at 12:27

Ahmad Osama

votes

1 answer

AWS Elastic Map Reduce: output to SimpleDB

What is the most efficient way to get Elastic Map Reduce output into SimpleDB? I'm aware that I could just output the results to S3, download them, and have a script parse the results and insert into SimpleDB. But is there an easier/faster way…

hadoop amazon-simpledb elastic-map-reduce

asked May 30 '12 at 16:06

Suman

9,221
5
49
62

votes

2 answers

File not cacheing on AWS Elastic Map Reduce

I'm running the following MapReduce on AWS Elastic MapReduce: ./elastic-mapreduce --create --stream --name CLI_FLOW_LARGE --mapper s3://classify.mysite.com/mapper.py --reducer s3://classify.mysite.com/reducer.py --input …

python hadoop amazon-web-services elastic-map-reduce

asked Apr 30 '12 at 22:49

Ben G

26,091
34
103
170

votes

1 answer

Why Elastic MapReduce job flow failed in AWS MapReduce?

I created a job flow in AWS MapReduce, I created a job flow of Contextual Advertising (Hive Script) - done 'Start interactive Hive Session', selected m1.small instances, proceeded without a VPC subnet id and Configure Hadoop in Configure Bootstrap…

amazon-s3 amazon-ec2 elastic-map-reduce amazon-iam

asked Apr 30 '12 at 07:08

Advait

5,771
3
18
18

Prev 1 2 3

…

30 31 Next