Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

0 answers

Reduce Elastic Map Reduce runtime

I use Elastic Map Reduce to analyze large amount of data (stored on S3). What is the most cost efficient way to reduce the runtime of the job other than increasing the size of the instance. If I create more smaller files on S3 will it reduce the…

emr elastic-map-reduce bigdata

asked Jan 26 '16 at 12:23

ljaerj

votes

2 answers

Use gzip input codec on files without .gz extension in hadoop

I'm running a Hadoop job on a bunch of gzipped input files. Hadoop should handle this easily... mapreduce in java - gzip input files Unfortunately, in my case, the input files don't have a .gz extension. I'm using CombineTextInputFormatClass, which…

java hadoop mapreduce emr elastic-map-reduce

asked Oct 27 '15 at 18:30

John Chrysostom

3,973
1
34
50

votes

1 answer

How to force Hadoop to unzip inputs regadless of their extension?

I'm running map-reduce and my inputs are gzipped, but do not have a .gz (file name) extension. Normally, when they do have the .gz extension, Hadoop takes care of unzipping them on the fly before passing them to the mapper. However, without the…

hadoop mapreduce emr elastic-map-reduce amazon-emr

asked Aug 12 '15 at 15:02

GilLevi

2,117
5
22
38

votes

1 answer

Reading many small files from S3 very slow

Loading many small files (>200000, 4kbyte) from a S3 Bucket into HDFS via Hive or Pig on AWS EMR is extremely slow. It seems that only one mapper is used to get the data, though I cannot exactly figure out where the bottleneck is. Pig Code…

amazon-web-services amazon-s3 hive apache-pig elastic-map-reduce

asked Jun 04 '15 at 12:53

FtoTheZ

votes

0 answers

Elastic MapReduce with boto - InstanceProfile is required for creating cluster

Im trying to do a elastic mapreduce job with code below, but when I try this I get an error: InstanceProfile is required for creating cluster Someone knows why Im getting this error? def createmrjob(dict): emr =…

python amazon-web-services boto elastic-map-reduce

asked Jun 01 '15 at 12:35

techman

votes

2 answers

AWS EMR validation error

I have a problem running a map-reduce java application I simplified my problem using the tutorial code given from AWS which runs a pre-defined step: public class Main { public static void main(String[] args) { AWSCredentials credentials =…

hadoop amazon-web-services amazon-ec2 emr elastic-map-reduce

asked Feb 28 '15 at 12:39

user3537890

votes

2 answers

Mapreduce job to HBase throws IOException: Pass a Delete or a Put

I am trying to output to a HBase table directly from my Mapper while using Hadoop2.4.0 with HBase0.94.18 on EMR. I am getting a nasty IOException: Pass a Delete or a Put when executing the code below. public class TestHBase { static class…

java hadoop mapreduce hbase elastic-map-reduce

asked Feb 16 '15 at 00:07

Marsellus Wallace

17,991
25
90
154

votes

1 answer

Elasticsearch _cat/indices is giving error?

Currently I am using elasticsearch helper scan api, but it is not able to fetch data. command : helpers.scan( client=client, query={"query":{"match_all":{}}}, scroll='10m', index="debug", doc_type = "tool",…

elasticsearch elastic-map-reduce

asked Jan 23 '15 at 11:10

Birendra Kumar

votes

1 answer

Amazon EMR sorting

I am new to Amazon EMR, and I am trying to understand how does the sorting phase after the map (before the reduce phase) works and if I can manipulate it (by some how supplying it my own compare function. If you know how the output from the map…

hadoop mapreduce elastic-map-reduce amazon-emr

asked Jan 16 '15 at 14:59

ohad edelstain

1,425
2
14
22

votes

1 answer

How to use Python streaming UDFs in pig on Amazon EMR

Pig 0.12 introduced streaming python UDFs, but they're experimental, so they need Hadoop 1. http://pig.apache.org/docs/r0.12.1/udf.html#python-udfs However, the only Amazon-provided AMI that can use pig 0.12 is AMI 3.1.0, which uses hadoop 2.4, not…

python numpy apache-pig elastic-map-reduce amazon-ami

asked Sep 04 '14 at 01:09

warbaker

votes

1 answer

Does Hadoop Streaming's performance decrease if I use -mapper cat rather than -mapper org.apache.hadoop.mapred.lib.IdentityMapper?

I have had problems trying to use org.apache.hadoop.mapred.lib.IdentityMapper as the argument of -mapper in Hadoop Streaming 1.0.3. "cat" works though; does using cat affect performance -- especially on Elastic MapReduce?

hadoop hadoop-streaming elastic-map-reduce

asked Jul 24 '14 at 17:31

verve

votes

1 answer

How to debug Pig being stuck after job submission

I have a map-reduce job written in Pig that is doing the following. Given a set of apache log files representing visits to a certain resource on a website clean the logs from the robots and from the unwanted log lines produce the tuples (ip,…

apache-pig elastic-map-reduce

asked Jul 07 '14 at 12:57

mottalrd

4,390
5
25
31

votes

1 answer

"Unable to verify integrity of data" while running MR job

I'm running a relatively big MR job using Amazon Elastic Map Reduce. I ran the job plenty of times on small data sets with no problem. But when trying to run it on a large dataset I'm getting the following exception: Error:…

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked May 24 '14 at 18:51

itzhaki

votes

0 answers

EMR hadoop tasks agonize for hours when losing task nodes

I've set up an Amazon EMR jobflow with 1 on-demand core node and 4 task nodes with bidding. When I run my task on only the core node each step finishes within 1 hour. When I'm lucky and have 1 core + 4 task nodes then steps usually finish within 10…

hadoop elastic-map-reduce emr

asked May 21 '14 at 14:51

Gavriel

18,880
12
68
105

votes

5 answers

How to setup an elasticsearch cluster

I am trying to setup a multi node elastic search cluster.Any useful link which i can follow to setup cluster. I am trying to run a map reduce programe in cluster to find out exact matches .

elasticsearch elastic-map-reduce amazon-elasticache

asked Apr 27 '14 at 07:04

Amaresh

3,231
7
37
60

Prev 1 2 3

…

30 31 Next