Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Anyone using DynamoDB and Hive without using EMR?

I was reading the below integration of using Hive for querying data on DynamoDB. http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html But as per that link, Hive needs to be setup on top of EMR. But I wanted…

hadoop amazon-dynamodb elastic-map-reduce

asked Apr 18 '12 at 20:28

Arvind

vote

2 answers

How can we pass arguments for Hadoop Streaming from AWS SDK for PHP?

I'm trying to add some job via AWS SDK for PHP. I'm able to successfully start a cluster and start new job flow via API but I'm getting an error while trying to create Hadoop Streaming step. Here is my code: // add some jobflow steps $response =…

php amazon-web-services elastic-map-reduce hadoop-streaming amazon-emr

asked Apr 02 '12 at 13:02

webdevbyjoss

vote

3 answers

how to run a mapreduce job on amazon's elastic mapreduce (emr) cluster from windows?

i'm trying to learn how to run a java Map/Reduce (M/R) job on amazon's EMR. the documentation that i am following is here http://aws.amazon.com/articles/3938. i am on a windows 7 computer. when i try to run this command, i am shown the help…

hadoop mapreduce elastic-map-reduce amazon-emr

asked Mar 08 '12 at 16:55

Jane Wayne

vote

1 answer

API calls inside mapreduce job

I would want to ask you about the inconveniences of calling an external API while running a map reduce job. which are the drawbacks? Some examples: If inside the mapper we need to geocode an address and we call a google maps api, or calling an…

java api hadoop mapreduce elastic-map-reduce

asked Mar 05 '12 at 10:38

Fgblanch

5,195
8
37
51

vote

1 answer

Calling a compiled binary on Amazon MapReduce

I'm trying to do some data analysis on Amazon Elastic MapReduce. The mapper step is a python script which includes a call to a compiled C++ binary called "./formatData". For example: # myMapper.py from subprocess import * inputData =…

hadoop amazon-ec2 mapreduce elastic-map-reduce amazon-emr

asked Feb 07 '12 at 00:53

tba

6,229
8
43
63

vote

3 answers

Does Amazon Elastic Map Reduce runs one or several mapper processes per instance?

My question is: should I care about multiprocessing in my mapper myself (read tasks from stdin then distribute them over worker processes, combine results in a master process and output to stdout) or Hadoop will take care of it automatically? I…

hadoop amazon-web-services mapreduce elastic-map-reduce hadoop-streaming

asked Feb 03 '12 at 03:50

lithuak

6,028
9
42
54

vote

1 answer

How to set number of mapreduce task equal to 1 in hive

I tried following in hive- set hive.exec.reducers.max = 1; set mapred.reduce.tasks = 1; from flat_json insert overwrite table aggr_pgm_measure PARTITION(dt='${START_TIME}') reduce log_time, req_id, ac_id, client_key, rulename, categoryname, bsid,…

hadoop mapreduce hive elastic-map-reduce

asked Dec 27 '11 at 10:42

Anurag Saxena

vote

1 answer

Amazon MapReduce input splitting and downloading

I'm new to EMR and just had a few questions i have been struggling with the past few days. The first of which is the logs that i want to process are already compressed as .gz and i was wondering if these types of files are able to be split by emr so…

amazon-s3 amazon-web-services elastic-map-reduce

asked Dec 07 '11 at 18:17

Brian

vote

2 answers

Exploring Hadoop code

I wanted to know about Hadoop more than a black box. I wanted to explore the Hadoop code itself. How can I download the bundle not from the trunk and where should I start from? Any help would be really helpful Thanks Shujaat

apache hadoop mapreduce elastic-map-reduce hadoop-plugins

asked Nov 23 '11 at 06:27

shujaat

vote

0 answers

What are some good measurement comparisons to be done using Ganglia metrics for Amazon Elastic Mapreduce programs?

I have seen Ganglia monitoring being implemented and analyzed on grid computing projects, but haven't read about any procedure for Amazon Elastic Mapreduce programs. Ganglia has a lot of metrics, but what are the important ones to focus on if we…

hadoop amazon-web-services mapreduce metrics elastic-map-reduce

asked Oct 16 '11 at 09:50

saud

vote

3 answers

Error SSHing to Elastic MapReduce JobFlow on AWS

When following the tutorial instructions for connecting to my JobFlow in EMR, I type following: ./elastic-mapreduce --jobflow j-3FLVMX9CYE5L6 --ssh and get this error: Permission denied (publickey) I'm already able to run other elastic-mapreduce…

amazon-web-services elastic-map-reduce

asked Oct 04 '11 at 00:35

Trindaz

17,029
21
82
111

vote

3 answers

POST Hadoop Pig output to a URL as JSON data?

I have a Pig job which analyzes log files and write summary output to S3. Instead of writing the output to S3, I want to convert it to a JSON payload and POST it to a URL. Some notes: This job is running on Amazon Elastic MapReduce. I can use a…

hadoop apache-pig elastic-map-reduce

asked Jun 28 '11 at 11:50

emk

60,150
6
45
50

vote

0 answers

EMR cluster running slow

I was running a map reduce Hadoop job on Amazon EMR 5.5.2 which uses Hadoop 2.7.3. I recently upgraded EMR to 5.12.1 which uses Hadoop 2.8.0. For the same input load, my new cluster is running comparatively very slow. I am not able to find out the…

amazon-web-services hadoop mapreduce emr elastic-map-reduce

asked Jun 06 '18 at 22:58

Gaurav Gandhi

vote

1 answer

How to configure AWS EMR to use s3 as hdfs storage

I am trying to create a EMR cluster with below configurations, but is failing in Bootstrap stage. The EMR release I am using is EMR 5.13.0 [ { "Classification": "core-site", "Properties": { "fs.defaultFS": "s3://my-s3-bucket", …

hdfs emr amazon-emr elastic-map-reduce

asked May 10 '18 at 11:35

Utk787

vote

1 answer

AWS EMR: Is it possible to re-use a terminated cluster?

I create a cluster. I finished my job and then I terminated the cluster. I want to know that is it possible to re-use this terminated cluster in the future? If no, is there anyway to delete the terminated clusters?

amazon-web-services amazon-emr elastic-map-reduce

asked Apr 26 '18 at 11:17

Saeid SOHEILY KHAH

Prev 1 2 3

…

30 31 Next