Questions tagged [emr]

Questions relating to Amazon's Elastic MapReduce (EMR) product.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : elastic-map-reduce amazon-emr

1166 questions

votes

2 answers

Why increasing instances number doesn't increase Hive query speed

I created a table using Hive in Amazon's Elastic MapReduce, imported data to it and partitioned it. Now I run a query that counts the most frequent words from one of table fields. I run that query when I had 1 master and 2 core instances and it took…

hive elastic-map-reduce amazon-emr emr

asked Aug 25 '12 at 19:46

keepkimi

votes

1 answer

Import module in MRJob on EMR

Simple question: I have a module headers.py which defines a couple variables I need in my main MRJob script. I should be able to run the job with python MRMyJob -r emr --file=headers.py s3://input/data/path and then in my MRJob script (MRMyJob),…

python hadoop emr mrjob

asked Jul 31 '12 at 14:20

Vyassa Baratham

1,457
12
18

votes

2 answers

Error running python mrjob word count example

I'm trying to run the example word count map reduce task using mrjob. I get the following error: Traceback (most recent call last): File "mr.py", line 3, in from mrjob.job import MRJob File…

python mapreduce emr mrjob

asked Jul 10 '12 at 11:49

nickponline

25,354
32
99
167

votes

1 answer

How to make EMR to keep running

Possible Duplicate: Re-use Amazon Elastic MapReduce instance Can I keep a launched EMR cluster running and keep submitting new jobs to it until I am done (say after a couple of days) and then shut down the cluster or do I have to lanuch my own…

amazon-web-services amazon-emr emr

asked Jun 13 '12 at 00:14

iCode

4,308
10
44
77

votes

2 answers

Amazon Hadoop EMR & custom input file format

I am having a bit of trouble getting Amazon EMR accepting a custom InputFileFormat: public class Main extends Configured implements Tool { public static void main(String[] args) throws Exception { int res = ToolRunner.run(new JobConf(),…

hadoop amazon-web-services emr

asked Jun 04 '12 at 23:48

jldupont

93,734
56
203
318

votes

1 answer

Custom RecordReader in EMR Job

How do I specify a custom RecordReader to use in job flow on Amazon EMR? Note: Hadoop newbie here.

hadoop amazon-web-services emr

asked May 24 '12 at 02:24

jldupont

93,734
56
203
318

-1

votes

1 answer

How to copy file from HDFS to the local file system of the cluster nodes, in EMR cluster, using java api,

In EMR cluster, using java api, how to copy file from HDFS to the local file system of the cluster nodes?

java apache-spark hdfs emr

asked Jun 08 '18 at 02:31

Rajesh Goel

3,277
1
17
13

-1

votes

1 answer

spark-sql: How to get the progress bar (with stages and tasks)?

How can I get a progressbar on spark-sql? spark-shell get a nice progress bar like this: [Stage7:===========> (14174 + 5) / 62500] This progressbar tells what is the total number of executors allocated, how many are…

amazon-web-services apache-spark apache-spark-sql hadoop-yarn emr

asked Feb 06 '18 at 19:30

user 923227

2,528
4
27
46

-1

votes

1 answer

Looking for examples on how to launch AWS EMR cluster with python to run a pyspark step

I'm looking for an end-to-end example of launching an AWS EMR cluster with a pyspark step and have it automatically terminate when the step is done or fails. I've seen pieces of this explained but not one complete example.

python amazon-web-services pyspark emr

asked Jan 27 '18 at 13:13

Fred R.

-1

votes

1 answer

Convert Json keys into Columns in Spark

I have written a code which reads the data and picks the second element from the tuple. The second element happens to be a JSON. Code to get the JSON: import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import…

hadoop apache-spark mapreduce emr

asked Nov 14 '17 at 10:30

Ajay

-1

votes

1 answer

Number of executors and cores

I am new to spark and would like to know how many cores and executors have to be used in a spark job and AWS if we have 2 slave c4.8xlarge nodes and 1 c4.8x large master node. I have tried different combinations but not able to understand the…

amazon-web-services apache-spark emr

asked Apr 17 '17 at 18:22

Bharath

-1

votes

1 answer

Configuring Spark on EMR

When you pick a more performant node, say a r3.xlarge vs m3.xlarge, will Spark automatically utilize the additional resources? Or is this something you need to manually configure and tune? As far as configurations go, which are the most…

amazon-web-services apache-spark amazon-ec2 emr

asked Nov 05 '16 at 04:32

flybonzai

3,763
11
38
72

-1

votes

2 answers

connecting sftp server with in AWS

I am trying to create a job to connect sftp server from aws services to bring files into s3 storage in aws. It will be an automated job which runs every night and bring data into S3. I have seen documentation about how to connect aws and import data…

amazon-web-services amazon-s3 sftp amazon-redshift emr

asked Oct 17 '16 at 20:08

ac_sql

-1

votes

1 answer

Whats the Right way to mange Code deployement and management for AWS

we are on boarding very new on to AWS EMR's and we are looking at the right code repositories and automated code deployment tools. Is there a right tool for doing these where we can manage end-to-end in terms of code deployments. primarily we are…

hadoop amazon-web-services deployment bigdata emr

asked Apr 14 '15 at 18:09

sundeep veeramachaneni

-1

votes

1 answer

Pig script not working using Amazon EMR

I cannot get this script to work: raw = LOAD 's3://xxxxxxxxx/*' AS (name:chararray, year:float, occurrences:float, books:float); B = GROUP raw BY name; C = FOREACH B GENERATE B.name, (SUM(B.occurrences) / SUM(B.books)) AS average; D = ORDER C BY…

amazon-web-services apache-pig emr amazon-emr

asked Mar 27 '15 at 18:29

plain vanilla

Prev 1 2 3

…

78 Next