Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

0 answers

Apache Pig not using appropriate RecordWriter or OutputCommitter

The Problem I'm using a custom StoreFunc, OutputFormat, and OutputCommitter for use with Pig. The problem that I'm having is that Pig isn't calling some of the methods that I've defined in the OutputFormat that return the appropriate RecordWriter…

java hadoop apache-pig emr elastic-map-reduce

asked Jul 02 '15 at 18:05

llovett

1,449
1
12
21

votes

1 answer

Run Spark in standalone mode with Java

I've a problem: I cannot run a MapReduce Spark Job written in Java in AWS enviorment EMR. I've a maser node and 5 slaves. What is the correct way to let Spark run the java class? I tried with this guide but it doesn't work to me. Thanks!!

java amazon-web-services apache-spark emr elastic-map-reduce

asked Jul 01 '15 at 12:55

Giuseppe Matrella

votes

1 answer

Can I rerun failed mappers in EMR

I just woke up to a failed 16h long EMR MpaReduce job that failed because of a 'few' mappers that timed out. Is there a way to rerun only those failed mappers (yes it makes sense in my specific use case)? How?

mapreduce emr elastic-map-reduce amazon-emr

asked Jun 14 '15 at 18:54

Marsellus Wallace

17,991
25
90
154

votes

1 answer

No module named simplejson in python UDF on EMR

I'm running an Amazon Elastic MapReduce (EMR) job using Pig. I'm having trouble importing the json or simplejson modules into my Python user defined function (UDF). Here is my code: #!/usr/bin/env python import simplejson as…

amazon-web-services apache-pig elastic-map-reduce

asked May 31 '15 at 17:25

mostlyjason

votes

1 answer

AWS EMR - install HUE using Java SDK

I am trying to set up a StepConfig to install and run HUE on my cluster. I am creating steps in a following way: private StepConfig newInstallHueStep() { return new StepConfig() .withName("Install Hue") …

java amazon-web-services emr elastic-map-reduce hue

asked May 28 '15 at 13:57

Maciej Donajski

votes

2 answers

Export DynamoDB table to S3 with client side encryption

I'm trying to use Data Pipeline to export data to s3 from Dynamo. However, I can't figure out how to apply client side encryption before the file is written to s3. Is there a way to do this with Data Pipeline? I am able to set up everything except…

encryption amazon-s3 hive amazon-dynamodb elastic-map-reduce

asked May 10 '15 at 00:16

SqlDevInANoSqlWorld

votes

1 answer

Different file process in hadoop

I have installed Hadoop and hive. I can process and query over xls, tsv files using hive. I want to process other files such as docx, pdf, ppt. how can i do this? Is there any separate procedure to process these files in AWS? please help me.

hadoop amazon-web-services hive bigdata elastic-map-reduce

asked Mar 29 '15 at 03:30

Mahmudul Hasan

votes

1 answer

AWS EMR: how to get the first element out of describe_jobflows() API call result

I cannot figure out how to get the first element of the result from calling one of the boto emr APIs: describe_jobflows() i know it returns a list of jobflows, but when I'm trying to access it by using : jobflows[0] I got this: ERROR: 'JobFlow'…

python amazon-web-services boto emr elastic-map-reduce

asked Mar 12 '15 at 20:59

Fisher Coder

3,278
12
49
84

votes

1 answer

How can I access a file's content from mappers in Amazon elastic map reduce?

If I am running an EMR job (in Java) on Amazon Web Services to process large amounts of data, is it possible to have every single mapper access a small file stored on S3? Note that the small file I am talking about is NOT the input to the mappers.…

java hadoop amazon-web-services amazon-s3 elastic-map-reduce

asked Mar 11 '15 at 00:30

user3758133

votes

1 answer

Elastic search rows to column

I have documents in ElasticSearch like this.I want make search and get result like sql server pivot.But i dont know how can i do this. Name | Year | Gear C30 2012 A C30 2011 M C30 2014 M C30 2015 A C30 2013 A V40 …

elasticsearch elastic-map-reduce elasticsearch-plugin

asked Feb 10 '15 at 02:50

user1924375

10,581
6
20
27

votes

1 answer

About renting and using a cluster on Amazon EC2

I am researching now in the topic of improving the MapReduce scheduler but unfortunately my university does not provide a cluster for research purposes. I was thinking about renting a cluster and I heard about Amazon EC2, but I have no experience…

hadoop amazon-ec2 mapreduce elastic-map-reduce

asked Dec 10 '14 at 10:41

Flowra

1,350
2
16
19

votes

0 answers

Not a valid ProtoBuf in JobTrackerWatcher.findJobTrackerAddr(): Retrying to connect ZooKeeper Attempt# 0 Current ZooKeeper Server:

I'm getting below errors while trying to run mapreduce program from eclipse using MAPR windows client .. Can you please help what's wrong in this . Note : I'm able to access the MAPR fs from window cmd prompt: error log like below: INFO…

hadoop elastic-map-reduce mapr

asked Dec 05 '14 at 07:30

MapReddy Usthili

votes

1 answer

Map function fails in mapreduce run in EMR

I am running my own map reduce tasks on Amazon EMR. I see that the map tasks are failing, I am not able to find out the reason for the failed map tasks. import fileinput import csv myDict = {} csvreader = csv.reader(fileinput.input(mode='rb'),…

hadoop dictionary mapreduce elastic-map-reduce

asked Dec 02 '14 at 02:35

user3543477

votes

1 answer

compare 2 tables in hbase and write summary to the third table using TableMapReduceUtil

I need to use MR on Hbase to compare 2 tables table1, table2) in hbase and write summary to the third table ( table3) I am using the below TableMapReduceUtil psuedo code. Mapper: Table1 Reducer: Table3. In mapper, I need to compare Table1 value…

java hadoop mapreduce hbase elastic-map-reduce

asked Oct 23 '14 at 00:28

user3570620

votes

2 answers

Hadoop - how to improve performance of my case?

Currently I use AWS-EMR as the cluster. For the library, I use cascading. The input data is stored in aws S3, in a directory. The directory contains many files, each about 100mb large (not compressed, plain text), and the files can easily reach 100…

hadoop elastic-map-reduce amazon-emr cascading

asked Oct 16 '14 at 08:17

dieend

2,231
1
24
29

Prev 1 2 3

…

30 31 Next