Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

vote

1 answer

Elasticsearch-Hadoop get Non-indexed data

I have an elasticsearch cluster which has big amount of data. I want to extract all data from elasticsearch into Hadoop(Hive). I used Elasticsearch-Hadoop driver in order to extract data from elasticsearch by using Hive external table but it is too…

hadoop elasticsearch hadoop-streaming elastic-map-reduce elasticsearch-hadoop

asked Mar 13 '15 at 15:45

Yusuf Can Gürkan

vote

1 answer

Run an action in a bootstrap script after ResourceManager has started

I am starting an AWS EMR cluster using the amazon aws cli tools. I have a boostrap action that runs on the master and runs the following command hdfs dfs -put /home/hadoop/X.tar.gz / However I get the following error put: Call From…

hadoop amazon-web-services elastic-map-reduce

asked Feb 25 '15 at 20:10

Sapsi

vote

1 answer

If first attemp to reduce faills (network connection issues), the subsequent reduce attempts (retry) will fail because the output file already exists

I have mapreduce jobs failing big on Amazon EMR because if the first attempt fails to copy results to S3, the file (probably partial) will be created and subsequent reduce attempts will refuse write on a file that already exists. The first attempt…

hadoop mapreduce elastic-map-reduce emr

asked Dec 01 '14 at 10:01

SQL.injection

2,607
5
20
37

vote

1 answer

Where to access EMR counters for a terminated or running cluster

I'm running a jobflow on ElasticMapReduce, that terminates after completing all steps. How can I access the custom counters of each mapper or reducer after the cluster is killed? (maybe somewhere on s3 with the logs, if at all) How can I access…

mapreduce elastic-map-reduce amazon-emr

asked Nov 29 '14 at 09:29

eran

14,496
34
98
144

vote

2 answers

Problems while creating a hadoop client in my local machine

I have a namenode and data nodes running on aws. I configured foxyproxy and checked the following which are working: Ganglia Metrics Reports master-public-dns/ganglia/ Hadoop ResourceManager master-public-dns-name:9026 Hadoop NameNode …

java hadoop amazon-web-services hbase elastic-map-reduce

asked Oct 23 '14 at 14:24

nirvanastack

vote

1 answer

Amazon Web Service EMR FileSystem

I am trying to run a job on an AWS EMR cluster. The problem Im getting is the following: aws java.io.IOException: No FileSystem for scheme: hdfs I dont know where exactly my problem resides (in my java jar job or in the configurations of the job) In…

java hadoop amazon-web-services amazon-s3 elastic-map-reduce

asked Oct 20 '14 at 07:10

Andrea Schembri

vote

1 answer

AWS - How can I add EMR step in current step

I have an EMR cluster that runs a single step - custom JAR. I need to create a second step from the first step at runtime, how can I do it? I know I can do it using the CLI but how can I accomplish it using java? Thanks

java elastic-map-reduce amazon-emr

asked Aug 27 '14 at 12:38

Eitan Illuz

vote

1 answer

Number of concurrently running mappers per node drops precipitously on Elastic MapReduce w/ AMI 3.1.0 and Hadoop 2.4.0 as cluster size increases

In a related question (How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce), I ask for formulas relating the number of concurrently running mappers/reducers to YARN and MR2 memory parameters.…

hadoop amazon-web-services amazon-ec2 elastic-map-reduce hadoop-yarn

asked Aug 10 '14 at 13:31

verve

vote

2 answers

Running Simple Hadoop Command using Java code

I would like to list files using hadoop command. "hadoop fs -ls filepath". I want to write a Java code to achieve this. Can I write a small piece of java code, make a jar of it and supply it to Map reduce job(Amazon EMR) to achieve this ? Can you…

hadoop mapreduce elastic-map-reduce amazon-emr

asked Aug 04 '14 at 15:22

user1879956

vote

2 answers

How to import local Python package in Amazon Elastic MapReduce (EMR)?

I have two Python scripts that are intended to run on Amazon Elastic MapReduce - one as a mapper and one as a reducer. I've just recently expanded the mapper script to require a couple more local models that I've created that both live in a package…

python amazon-web-services hadoop-streaming elastic-map-reduce

asked Jul 25 '14 at 02:04

Sean Azlin

vote

1 answer

Region error when launching EMR cluster

I'm following this tutorial https://aws.amazon.com/articles/4926593393724923 to create and launch a simple spark cluster, Im interested in using spark streaming and kinesis so i created a role with the following policy { "Version": "2012-10-17", …

amazon-web-services apache-spark elastic-map-reduce emr

asked Jul 09 '14 at 19:57

franklynd

1,850
3
13
11

vote

2 answers

Hadoop on EMR - Map Tasks Not Parallel

I've set up an EMR job through Data Pipeline in AWS. This job is to transfer CSV data from S3 to DynamoDB. My data size is 400 MB. I set mapred.max.split.size = 134217728 (i.e. 128 MB). With that, I'm able to see in monitoring graph that there are 3…

hadoop elastic-map-reduce

asked Jun 10 '14 at 16:40

Mouli

1,621
15
20

vote

1 answer

How is data distributed among datanodes in MapReduce?

I'm new to MapReduce, I'm having the task to process large data(lines of records). One thing I should use is the line number of specific record in my mapper, and then reducer process the line number information based on the mapper. For instance,…

hadoop mapreduce elastic-map-reduce

asked May 22 '14 at 16:20

i3wangyi

2,279
3
15
12

vote

1 answer

Copying a large file (~6 GB) from S3 to every node of an Elastic MapReduce cluster

Turns out that copying a large file (~6 GB) from S3 to every node in an Elastic MapReduce cluster in a bootstrap action doesn't scale well; the pipe is only so big, and downloads to the nodes get throttled as # nodes gets large. I'm running a job…

caching hadoop amazon-web-services amazon-s3 elastic-map-reduce

asked May 21 '14 at 18:19

verve

vote

1 answer

"Access Denied" error using segue package in R

I suspect this is a very basic fix but I don't know what it is. setCredentials(awsAccessKeyText = 'myaccesskey', awsSecretKeyText = 'mysecretkey') myCluster <- createCluster(numInstances = 2) Error in .jcall("RJavaTools",…

r amazon-web-services amazon-ec2 segue elastic-map-reduce

asked Apr 23 '14 at 13:32

Nan

Prev 1 2 3

…

30 31 Next