Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

vote

1 answer

EMR & Spark adding dependencies after cluster creation

Is it possible to install additional libs/dependencies after the cluster is already up and running? Things I've done that are related to this: I've already done the pre-creation bootstrapping process (this is a different solution…

amazon-web-services apache-spark elastic-map-reduce

asked Jun 13 '16 at 22:58

Kristian

21,204
19
101
176

vote

1 answer

Elastic Map Reduce and amazon s3: Error regarding access keys

I am new to Amazon EMR and Hadoop in general. I am currently trying to set up a Pig job on an EMR cluster and to import and export data from S3. I have set up a bucket in s3 with my data named "datastackexchange". In an attempt to begin to copy the…

amazon-web-services amazon-s3 elastic-map-reduce

asked Mar 19 '16 at 20:39

Maeve90

vote

0 answers

AWS EMR - Python path, git repo and scripts

I am running MapReduce jobs on Hive and most of the code already resides in a git repo. I know I am able to include instructions in the bootstrap script when spawning up clusters, but is it possible to do all these things: Adjust the python path in…

python hadoop amazon-web-services hive elastic-map-reduce

asked Nov 25 '15 at 12:08

intl

2,753
9
45
71

vote

0 answers

Elastic Search Match_phrase not giving deterministic result

I have defined mapping in following way. PUT _template/name "mappings": { "_default_": { "name": { "type": "string", "analyzer" : "synonyms_expand", "index" : "analyzed", …

elasticsearch elastic-map-reduce elasticsearch-plugin match-phrase

asked Nov 06 '15 at 13:30

Onkar Kundargi

vote

1 answer

How to prevent hadoop fail job due to failed reduce task

I have running a s3distcp job in AWS EMR hadoop 2.2.0 version. And the job keep failed with a failed reducer task after 3 attempts. I also tried both: mapred.max.reduce.failures.percent mapreduce.reduce.failures.maxpercent to be 50 to the oozie…

hadoop mapreduce elastic-map-reduce

asked Oct 02 '15 at 15:34

user3285517

vote

0 answers

Running the temperature example on EMR hadoop cluster using Hadoop Development tools on eclipse

I'm still a newbie in hadoop. I'm trying to run the common temperature example on an Amazon EMR hadoop 2.6.0 cluster using Hadoop Development Tools on eclipse, I'm connecting through an SSH Tunnel and don't have connection problems so far since I…

java eclipse hadoop elastic-map-reduce hadoop-plugins

asked Aug 26 '15 at 13:03

Learner

vote

2 answers

R: replacing double escaped text

I'm gluing together a number of system calls using the Amazon Elastic Map Reduce command line tools. These commands return JSON text which has already been (partially?) escaped. Then when the system call turns it into an R text object (intern=T) it…

regex r elastic-map-reduce

asked Jul 05 '10 at 03:18

JD Long

59,675
58
202
294

vote

1 answer

fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey are not set for EMR default IAM roles

One of my EMR job relies on getting the AWS access key id and secret access key from the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties, respectively. However, when I run EMR cluster using the default EC2 and EMR roles, those…

emr elastic-map-reduce amazon-emr

asked Jul 22 '15 at 03:44

Kiet Tran

1,458
2
13
22

vote

0 answers

move data from HDFS to RDS directly

Background: I am working on a web project to expose analytical data stored on a local MSSQL database. The database is updated regularly. An EMR cluster is responsible to use custom Hive scripts to process raw data from S3 and save the analytical…

hadoop bigdata amazon-rds elastic-map-reduce

asked May 29 '15 at 16:53

Tzu

vote

2 answers

Processing HUGE number of small files independently

The task is to process HUGE (around 10,000,000) number of small files (each around 1MB) independently (i.e. the result of processing file F1, is independent of the result of processing F2). Someone suggested Map-Reduce (on Amazon-EMR Hadoop) for my…

hadoop amazon-web-services amazon-ec2 mapreduce elastic-map-reduce

asked May 11 '15 at 06:21

Daniel

5,839
9
46
85

vote

1 answer

Small files with Map Reduce or multi threading/multi processing

I've a batch of 500 files each around 45 kb. Each file requires around 87840 calculations (ARIMA regression problems) to be made. And each calculation is independent in it self. Given this, what is the best approach to develop a solution for such a…

multithreading mapreduce elastic-map-reduce

asked May 01 '15 at 23:59

NightOwl85

vote

1 answer

MRUnit Example for MultipleOutputs

I have written a Map only hadoop job in which i have used MultipleOutputs concept. The problem here is, i want to test this code with MRUnit. I don't see any working example for MultipleOutputs testing. My mapper code will be like, public void…

mapreduce elastic-map-reduce amazon-emr mrunit multipleoutputs

asked Apr 28 '15 at 11:03

Jahathesh

vote

1 answer

How to pass arguments to streaming job on Amazon EMR

I want to produce the output of my map function, filtering the data by dates. In local tests, I simply call the application passing the dates as parameters as: cat access_log | ./mapper.py 20/12/2014 31/12/2014 | ./reducer.py Then the parameters…

amazon-web-services hadoop-streaming elastic-map-reduce

asked Apr 09 '15 at 21:54

FelipeGTX

vote

1 answer

Elasticsearch query strategy for nested array elements

I am trying to find results by color. In the database, it is recorded in rgb format: an array of three numbers representing red, green, and blue values respectively. Here is how it is stored in the db and elasticsearch record (storing 4 rgb colors…

arrays json elastic-map-reduce elasticsearch

asked Apr 03 '15 at 17:48

diego

vote

0 answers

how to compare values from 2 index of elastic store in kibana

I have 2 indices in my elasticsearch store. Now in Kibana visualisation, i have to search for a count of docs in index B where "col_A" of indexA is equal to "col_H" of IndexB. Is this possible in kibana? If so please help me with the queries. TIA…

elasticsearch compare kibana elastic-map-reduce kibana-4

asked Mar 31 '15 at 20:15

h4it

Prev 1 2 3

…

30 31 Next