Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

vote

0 answers

Indexing a file contents in ElasticSearch

**I have a text file which contains some names like below: Tom, Harry Robert Harry Matt Tremp I want to index those names in ElasticSearch using JAVA APIs which should index all the names automatically. Can anybody suggest any solution as I am new…

elasticsearch elastic-map-reduce

asked Apr 22 '14 at 11:09

Amaresh

3,231
7
37
60

vote

2 answers

How to implement the combiner in Hadoop MapReduce?

I understand that for including a combiner in Hadoop MapReduce the following line is included (which I have done already); conf.setCombinerClass(MyReducer.class); What I don't understand is that where do I actually implement the functionality of…

java hadoop mapreduce elastic-map-reduce

asked Mar 13 '14 at 12:58

ali

vote

1 answer

Writing to DSE from an external Pig Job (Pig -> DSE connector)

I'm trying to write an EMR job running Pig that writes to DSE which we'll be using for serving. Unfortunately, I can't get Pig to write to DSE so I've broken down the problem to just connecting to the DSE node and try to write to it. Here's what I'm…

cassandra apache-pig elastic-map-reduce datastax-enterprise datastax

asked Feb 26 '14 at 02:52

ankit

vote

1 answer

how to make aws elastic mapreduce hive commands run in parallel

I reviewed here, How to make hive run mapreduce jobs concurrently? My question is how to set this "hive.exec.parallel.thread.number" option in an Amazon EMR cluster on startup? Also, is setting this option equivalent to doing something like the…

amazon-web-services hive elastic-map-reduce

asked Jan 27 '14 at 23:13

Patrick McCann

vote

2 answers

Hive Query Number of mappers always 1

Im trying to run a simple query on a table with one partition which has around 200-300k records all of them are small files of 120bytes. I'm using a custom INPUTFORMAT which reads the file contents and then query another s3 file to fetch the actual…

mapreduce hive elastic-map-reduce

asked Jan 02 '14 at 10:03

Ravi

vote

4 answers

Combine output files of MapReduce job

I have written a Mapper and Reducer in Python and have executed it successfully on Amazon's Elastic MapReduce(EMR) using Hadoop Streaming. The final result folder contains the output in three different files part-00000, part-00001 and part-00002.…

python hadoop mapreduce hadoop-streaming elastic-map-reduce

asked Dec 14 '13 at 08:21

Arun Kumar

vote

2 answers

Lauching a map reduce job in amazon elastic map reduce

I am trying to launch a map reduce job in amazon map reduce cluster. My map reduce job does some pre-processing before generating map/reduce tasks. This pre-processing requires third party libs such as javacv, opencv. Following the amazon's…

elastic-map-reduce

asked Dec 05 '13 at 06:37

Bala

vote

1 answer

How to get data from S3 and use them for Elastic map reduce/ where to write codes?

I have two big files and have uploaded them into an Amazon S3 bucket named "ccssdd" and created a folder named data: data/friendships.xml data/users.xml structure of users is 1 24 4 7 …

hadoop amazon-s3 mapreduce elastic-map-reduce amazon-emr

asked Nov 17 '13 at 01:15

Shane

vote

2 answers

Which node sort/shuffle the keys in Hadoop?

In a Hadoop job, which node does the sorting/shuffling phase? Does increasing the memory of that node improve the performance of sorting/shuffling?

hadoop mapreduce elastic-map-reduce

asked Oct 30 '13 at 05:54

HHH

6,085
20
92
164

vote

1 answer

MapReduce Amazon Python Get the line umber of the input file

I have several texts and I want to know the line number and the file where appears a word. I got the file well but not the line number. This is the map #!/usr/bin/env python import sys import os find = 'but' #word to find linesCont = 0 file =…

python hadoop mapreduce elastic-map-reduce

asked Oct 12 '13 at 12:14

Carlos S

vote

1 answer

Allow more than one hadoop/EMR tasks to fail before shutting down

I'm trying to use hadoop on Amazon Elastic MapReduce where I have thousands of map tasks to perform. I'm OK if a small percentage of the tasks fail, however, Amazon shuts down the job and I lose all of the results when the first mapper fails. Is…

hadoop amazon-web-services hadoop-streaming elastic-map-reduce

asked Oct 07 '13 at 17:29

user1910316

vote

3 answers

EMR - create user log from log

EMR Newbie Alert: We have large logs containing the usage data of our web site. Customers are authenticated and identified by their customer id. Whenever we try to troubleshoot a customer issue we grep through all the logs (using the customer_id as…

hadoop-streaming elastic-map-reduce

asked Oct 04 '13 at 03:16

Benno Waldmann

vote

1 answer

Number of region servers on Amazon AWS

Say I start an cluster on Amazon elastic mapreduce and have one Master node instance, 2 core node instances and 15 task node instances. I think I uploaded around 1 TB of data into hbase using mapreduce jobs and incremental uploads. Now - How do I…

hadoop hbase elastic-map-reduce

asked Oct 03 '13 at 10:01

Run2

1,839
22
32

vote

1 answer

Can I write mapper and reducer program in different language

I felt doing my Mapper operation in Perl script but then I realized it would be easier to write Reducer in Python. Can Mapper and Reducer can work in different programming language?

perl python-3.x mapreduce elastic-map-reduce

asked Sep 15 '13 at 00:16

CtrlV

vote

1 answer

Copying/using Python files from S3 to Amazon Elastic MapReduce at bootstrap time

I've figured out how to install python packages (numpy and such) at the bootstrapping step using boto, as well as copying files from S3 to my EC2 instances, still with boto. What I haven't figured out is how to distribute python scripts (or any…

amazon-web-services amazon-s3 amazon-ec2 boto elastic-map-reduce

asked Aug 18 '13 at 19:14

user2105469

1,413
3
20
37

Prev 1 2 3

…

30 31 Next