Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
0
votes
0 answers

Apache Pig not using appropriate RecordWriter or OutputCommitter

The Problem I'm using a custom StoreFunc, OutputFormat, and OutputCommitter for use with Pig. The problem that I'm having is that Pig isn't calling some of the methods that I've defined in the OutputFormat that return the appropriate RecordWriter…
llovett
  • 1,449
  • 1
  • 12
  • 21
0
votes
1 answer

Run Spark in standalone mode with Java

I've a problem: I cannot run a MapReduce Spark Job written in Java in AWS enviorment EMR. I've a maser node and 5 slaves. What is the correct way to let Spark run the java class? I tried with this guide but it doesn't work to me. Thanks!!
0
votes
1 answer

Can I rerun failed mappers in EMR

I just woke up to a failed 16h long EMR MpaReduce job that failed because of a 'few' mappers that timed out. Is there a way to rerun only those failed mappers (yes it makes sense in my specific use case)? How?
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
0
votes
1 answer

No module named simplejson in python UDF on EMR

I'm running an Amazon Elastic MapReduce (EMR) job using Pig. I'm having trouble importing the json or simplejson modules into my Python user defined function (UDF). Here is my code: #!/usr/bin/env python import simplejson as…
0
votes
1 answer

AWS EMR - install HUE using Java SDK

I am trying to set up a StepConfig to install and run HUE on my cluster. I am creating steps in a following way: private StepConfig newInstallHueStep() { return new StepConfig() .withName("Install Hue") …
0
votes
2 answers

Export DynamoDB table to S3 with client side encryption

I'm trying to use Data Pipeline to export data to s3 from Dynamo. However, I can't figure out how to apply client side encryption before the file is written to s3. Is there a way to do this with Data Pipeline? I am able to set up everything except…
0
votes
1 answer

Different file process in hadoop

I have installed Hadoop and hive. I can process and query over xls, tsv files using hive. I want to process other files such as docx, pdf, ppt. how can i do this? Is there any separate procedure to process these files in AWS? please help me.
0
votes
1 answer

AWS EMR: how to get the first element out of describe_jobflows() API call result

I cannot figure out how to get the first element of the result from calling one of the boto emr APIs: describe_jobflows() i know it returns a list of jobflows, but when I'm trying to access it by using : jobflows[0] I got this: ERROR: 'JobFlow'…
Fisher Coder
  • 3,278
  • 12
  • 49
  • 84
0
votes
1 answer

How can I access a file's content from mappers in Amazon elastic map reduce?

If I am running an EMR job (in Java) on Amazon Web Services to process large amounts of data, is it possible to have every single mapper access a small file stored on S3? Note that the small file I am talking about is NOT the input to the mappers.…
0
votes
1 answer

Elastic search rows to column

I have documents in ElasticSearch like this.I want make search and get result like sql server pivot.But i dont know how can i do this. Name | Year | Gear C30 2012 A C30 2011 M C30 2014 M C30 2015 A C30 2013 A V40 …
user1924375
  • 10,581
  • 6
  • 20
  • 27
0
votes
1 answer

About renting and using a cluster on Amazon EC2

I am researching now in the topic of improving the MapReduce scheduler but unfortunately my university does not provide a cluster for research purposes. I was thinking about renting a cluster and I heard about Amazon EC2, but I have no experience…
Flowra
  • 1,350
  • 2
  • 16
  • 19
0
votes
0 answers

Not a valid ProtoBuf in JobTrackerWatcher.findJobTrackerAddr(): Retrying to connect ZooKeeper Attempt# 0 Current ZooKeeper Server:

I'm getting below errors while trying to run mapreduce program from eclipse using MAPR windows client .. Can you please help what's wrong in this . Note : I'm able to access the MAPR fs from window cmd prompt: error log like below: INFO…
MapReddy Usthili
  • 288
  • 1
  • 7
  • 23
0
votes
1 answer

Map function fails in mapreduce run in EMR

I am running my own map reduce tasks on Amazon EMR. I see that the map tasks are failing, I am not able to find out the reason for the failed map tasks. import fileinput import csv myDict = {} csvreader = csv.reader(fileinput.input(mode='rb'),…
user3543477
  • 615
  • 3
  • 13
  • 26
0
votes
1 answer

compare 2 tables in hbase and write summary to the third table using TableMapReduceUtil

I need to use MR on Hbase to compare 2 tables table1, table2) in hbase and write summary to the third table ( table3) I am using the below TableMapReduceUtil psuedo code. Mapper: Table1 Reducer: Table3. In mapper, I need to compare Table1 value…
user3570620
  • 359
  • 1
  • 6
  • 16
0
votes
2 answers

Hadoop - how to improve performance of my case?

Currently I use AWS-EMR as the cluster. For the library, I use cascading. The input data is stored in aws S3, in a directory. The directory contains many files, each about 100mb large (not compressed, plain text), and the files can easily reach 100…
dieend
  • 2,231
  • 1
  • 24
  • 29