Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Python program in AWS Elastic MapReduce fails in step execution

I'm trying to start a Python program in Elastic MapReduce Step Execution. It is a Spark Application with the following parameters: Deploy-mode: Cluster Spark-submit options: --executor-memory 1g Application location:…

python amazon-web-services amazon-s3 apache-spark elastic-map-reduce

asked Mar 14 '16 at 09:05

Steffen Schmitz

votes

0 answers

ElasticSearch Performance optimization

I have a single node dedicated for ES server. We have indexed around 25 GB Data in it. I am using a bool query to fetch data and it takes around 6-7 minutes to give the results. The RAM on the node is just 1 GB. I understand that the RAM and other…

elasticsearch elastic-map-reduce elasticsearch-plugin

asked Feb 29 '16 at 11:39

Vipul Kumar

votes

1 answer

How to know how many keys did map-reduce job processed?

How can map-reduce job generate metrics about how many keys it has processed and give data like the following? % of keys that belonged to this particular value.

mapreduce elastic-map-reduce

asked Feb 01 '16 at 04:50

adarshhsingh

votes

1 answer

Visitor / User profiling based on clickstream data?

We build a rails 4 site and use ES for our search travel/accommodation engine. We created a separate ES index for clickstream data, and we store data for non-login(session_id) and login users (user_id). We use the stored data now to show viewed and…

hadoop elasticsearch mahout elastic-map-reduce mahout-recommender

asked Jan 22 '16 at 20:54

Remco

votes

1 answer

MapReduce with filename as Key, contents as Values, many small files

I've looked at FileInputFormat where filename is KEY and text contents are VALUE, How to get Filename/File Contents as key/value input for MAP when running a Hadoop MapReduce Job?, and Getting Filename/FileData as key/value input for Map when…

java hadoop elastic-map-reduce

asked Dec 07 '15 at 08:18

kcmgrew

votes

1 answer

take sample of file from AWS s3 and put to another location in s3

It is always possible using s3distcp to copy a file(or set of files) into another location of s3, but is it possible, using mapred or any other functionality of Hadoop/EMR to take a random sample(or every nth line) of the file(s) to a new location…

hadoop amazon-web-services awk amazon-emr elastic-map-reduce

asked Nov 30 '15 at 15:35

Kuber

1,023
12
21

votes

2 answers

Hadoop MapReduce Out of Memory on Small Files

I'm running a MapReduce job against about 3 million small files on Hadoop (I know, I know, but there's nothing we can do about it - it's the nature of our source system). Our code is nothing special - it uses CombineFileInputFormat to wrap a bunch…

java hadoop amazon-web-services mapreduce elastic-map-reduce

asked Nov 20 '15 at 19:26

John Chrysostom

3,973
1
34
50

votes

1 answer

What is the minimal set of outbound rules required of the master/slave security groups for an EMR cluster?

I'm trying to secure a pipeline for analyzing controlled-access genomic data with Amazon Elastic MapReduce (EMR), and it would help to know the minimal set of outbound rules required of the master and slave security groups of an EMR cluster. I'm…

amazon-web-services elastic-map-reduce

asked Nov 11 '15 at 17:49

verve

votes

1 answer

PDI jobs not seen as Mapreduce jobs in Resource Manager or Job History server

I am using Pentaho 5.4 and EMR 3.4 When I execute a transformation in Pentaho to copy data from Oracle DB to HDFS on EMR, I don't see any MR jobs in Resource manager of the Hadoop(EMR) cluster. Am I supposed to see them as MR jobs or pentaho just…

hadoop mapreduce pentaho elastic-map-reduce data-integration

asked Nov 06 '15 at 15:30

hadooper

votes

1 answer

Can an EMR cluster be launched into a private VPC subnet with no public IPs that accesses the internet through a NAT instance in a public subnet?

Is it possible to launch an EMR cluster into the private subnet of a scenario-2 VPC (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html) where a NAT instance is in the public subnet, and where each instance in the private…

amazon-web-services amazon-s3 elastic-map-reduce

asked Nov 05 '15 at 20:05

verve

votes

1 answer

DynamoDB schema for referral data

I'm wanting to try out DynamoDB and use it for access.logs generated by nginx, which will later be used for a reporting dashboard, that'll include IP, referral url, referral domain, browser, etc. The initial setup will be EC2 instances running nginx…

amazon-dynamodb elastic-map-reduce

asked Nov 01 '15 at 17:43

dzm

22,844
47
146
226

votes

0 answers

HADOOP HIVE mr.MapredLocalTask (MapredLocalTask.java:execute(276)) - Execution failed with exit status: 137

Im trying to run a job in hive with cluster(1 master, 4 core nodes[11.25GB each]) in AWS EMR, im joining(MAP joining) two tables one with 0.3 million entries(~11mb) and another table with almost 7 million entries(took care that big table should be…

hadoop mapreduce hive emr elastic-map-reduce

asked Oct 07 '15 at 11:46

jeevan sirela

votes

0 answers

Load hive tables from multiple mappers

I am working on the problem where I have a large number of small compressed text file. Each file size is approx 10-20kb and have TBs of data. I need to load these files into Hive. Later, Tableau will use HIVE tables for its report generation. I am…

hadoop mapreduce hive elastic-map-reduce

asked Sep 22 '15 at 04:49

Ajay

votes

1 answer

Amazon ElasticMapReduce(EMR) controlling split size / num of mappers

How can I change this configuration? For my application, a split size of 64/128 is too much for me, and I would like to have a split size of 16 mb for example. How can I do it?

hadoop amazon-web-services elastic-map-reduce

asked Aug 20 '15 at 21:16

member555

votes

0 answers

Hadoop - Directory Structure and Distributed Cache

Imagine the situation, that I have multiple jobs executing concurrently in a hadoop cluster. These jobs are using the Distributed Cache. Each of them use diferent files , but with the same name. (I am using the ToolInteface to distribute these…

hadoop mapreduce elastic-map-reduce distributed-cache

asked Aug 12 '15 at 18:54

p.magalhaes

7,595
10
53
108

Prev 1 2 3

…

30 31 Next