Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Running HIVE queries directly from S3 input files

I am using Interative Hive Session in Elastice Map Reduce to run Hive. Previously I was loading data from S3 into Hive tables.Now, I want to run some scripts on S3 input files without loading data into Hive Tables. Is this possible?If yes then how…

amazon-s3 amazon-web-services hive elastic-map-reduce

asked Dec 04 '12 at 06:28

asquare

votes

1 answer

For a large mapreduce job, with a few lingering reducers, can this job be safely downsized?

Chris Smith answered this question and said I could post it. If you have a 200-node mapreduce job, with just 3 running reduce jobs left lingering, is it safe to switch off all nodes except the master and the 3 with the running jobs? Plus maybe a…

hadoop elastic-map-reduce

asked Nov 01 '12 at 19:13

tphyahoo

votes

2 answers

Can I access zookeeper from AWS Elastic Mapreduce job

I'm new to Hadoop, and running under AWS Elastic Mapreduce. I need cluster-wide atomic counters in Hadoop and was suggested to use zookeeper for this. I believe zookeeper is part of the Hadoop stack (right?), how would I access it from an Elastic…

hadoop amazon-web-services apache-zookeeper elastic-map-reduce emr

asked Oct 27 '12 at 03:46

David Parks

30,789
47
185
328

votes

1 answer

Sessionized web logs, get previous and next domain

We have a large pile of web log data. We need to sessionize it, and also generate the previous domain, and next domain for each session. I am testing via an interactive job flow on AWS EMR. Right now I'm able to get the data sessionized using this…

session hadoop amazon-web-services apache-pig elastic-map-reduce

asked Oct 26 '12 at 20:58

Dan

5,081
1
18
28

votes

1 answer

Load balancing Cascading JDBCTap for MySQL

I am considering writing a Cascading application that issues SELECT statements to MYSQL databases where each query can return millions of rows. Each database exists on N slaves and one master, as shown here:…

mysql hadoop connection-pooling elastic-map-reduce cascading

asked Oct 19 '12 at 19:51

newToFlume

votes

2 answers

Why increasing instances number doesn't increase Hive query speed

I created a table using Hive in Amazon's Elastic MapReduce, imported data to it and partitioned it. Now I run a query that counts the most frequent words from one of table fields. I run that query when I had 1 master and 2 core instances and it took…

hive elastic-map-reduce amazon-emr emr

asked Aug 25 '12 at 19:46

keepkimi

votes

2 answers

Can you programmatically control Elastic Mapreduce jobs easily?

There is a command line client written in ruby that is used as the standard. However, it doesn't run in 1.9. There is also a very good aws-sdk for ruby, but it doesn't support EMR. Is there a good alternative?

ruby hadoop elastic-map-reduce amazon-emr

asked Jun 08 '12 at 02:36

nkadwa

votes

1 answer

How do I pass the Hadoop Streaming -file flag to Amazon ElasticMapreduce?

The -file flag allows you to pack your executable files as a part of job submission and thus allow you to run a MapReduce without first manually copying the executable to S3. Is there a way to use the -file flag with Amazon's elastic-mapreduce…

elastic-map-reduce hadoop-streaming

asked Jun 02 '12 at 01:25

tibbe

8,809
7
36
64

votes

1 answer

Elastic MapReduce fails with: 1: Syntax error: "(" unexpected

I'm trying to run a native binary, compiled on my x86 Debian Squeeze box (to match the Amazon AMI), and I'm consistently getting this weird…

elastic-map-reduce

asked Jun 01 '12 at 10:41

tibbe

8,809
7
36
64

votes

1 answer

Performance Impact on Elastic Map reduce for Scale Up vs Scale Out scenario's

I just ran Elastic Map reduce sample application: "Apache Log Processing" Default: When I ran with default configuration (2 Small sized Core instances) - it took 19 minutes Scale Out: Then I ran it with configuration: 8 small sized core instances -…

amazon-web-services mapreduce elastic-map-reduce

asked Apr 16 '12 at 02:20

paras_doshi

1,027
1
12
19

-1

votes

3 answers

Comparing two large datasets using a MapReduce programming model

Let's say I have two fairly large data sets - the first is called "Base" and it contains 200 million tab delimited rows and the second is call "MatchSet" which has 10 million tab delimited rows of similar data. Let's say I then also have an…

hadoop mapreduce elastic-map-reduce

asked Nov 28 '11 at 15:14

j03m

5,195
4
46
50

-1

votes

2 answers

Hive with Tez out of memory error

I have a script which runs fine on hive 13(YARN) I am experimenting with tez. When I run a query on large dataset , I run into the following error. 0 FATAL [Socket Reader #1 for port 55739] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread…

hive hadoop-yarn hadoop2 elastic-map-reduce apache-tez

asked Dec 07 '15 at 23:01

user2942227

1,023
6
19
26

-1

votes

1 answer

Error while map reduce program in python

I am executing the Map reduce program in python on local system and getting the below error: Password:Traceback (most recent call last): File "./wordcount_mapper.py", line 7, in filename = os.environ["mapreduce_map_input_file"] File…

python python-2.7 mapreduce elastic-map-reduce

asked Sep 20 '15 at 13:55

Aquarius24

1,806
6
33
61

-1

votes

2 answers

Python csv skipping fields with quoted

Trying to do practice on using large data on AWS using mapreduce and python. I have the code import sys import re import csv import glob import string #class MyDialect(csv.Dialect): #strict = True …

python csv elastic-map-reduce

asked Apr 27 '15 at 16:43

Sean Sullivan

-1

votes

1 answer

How to write mapreduce program with amazon ec2 and s3

I want to analyse data stored in amazon s3, how can I write java program on amazon emr and access these data. The data url is http://s3.amazonaws.com/aws-publicdatasets/trec/kba/FAKBA1/index.html

amazon-web-services amazon-s3 elastic-map-reduce

asked Feb 21 '15 at 16:02

BrickMover

Prev 1 2 3

…

31 Next