Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
3
votes
1 answer

AWS EMR S3DistCp: The auxService:mapreduce_shuffle does not exist

I am connected to an AWS EMR v5.4.0 instance over SSH and I want to call s3distcp. This link demonstrates how to setup an emr step to call it, but when I run it I get the following error: Container launch failed for…
Mark J Miller
  • 4,751
  • 5
  • 44
  • 74
3
votes
1 answer

How does parquet encryption work in AWS EMR?

I'm looking at the AWS documentation for enabling encryption on EMR, but I can't find any information on how this impacts the performance of Parquet files. Can EMR still take advantage of Parquet when optimizing queries? Examples: select count(1)…
Upio
  • 1,364
  • 1
  • 12
  • 27
3
votes
0 answers

Elasticsearch Ingest Attachment in PHP is not working

Searching is not working in Ingest Processor in ElasticSearch Here is the mapping code public function ingest_processor_mapping() { $client = \Elasticsearch\ClientBuilder::create()->build(); $params = [ 'id' => 'attachment', …
3
votes
2 answers

Error when creating aws emr default-roles

I'm trying to create a cluster using aws cli emr command. However, I can't seem to be able to create-default-roles needed before calling aws emr create-cluster $ aws emr create-default-roles A client error (NoSuchEntity) occurred when calling the…
Shoaib Burq
  • 384
  • 2
  • 13
3
votes
1 answer

How to add mapreduce.reduce.memory.mb property to EMR Cluster in Cloud Formation template?

I've been taking a look on how to modify the default values that EMR gives to the cluster depending on the type of machine it is. In my case, it's a pretty basic setup of a m4.large as master and c3.2xlarge as core and the same for the task. The…
3
votes
1 answer

AWS EMR PySpark connect to mysql

I'm trying to connect via pyspark to a mysql using jdbc. I was able to do it outside EMR. But when I try in with EMR, pyspark doesn't start correctly. The command that I used in my machine pyspark --conf…
3
votes
1 answer

Spark standalone mode on AWS EMR

I'm able to run Spark on AWS EMR without much trouble following the documentation but from what I see it always uses YARN instead of the standalone manager. Is there any way to use the standalone mode instead of YARN easily? I don't really feel like…
3
votes
1 answer

"Unable to execute HTTP Request: Broken Pipe" with Hadoop / s3 on Amazon EMR

I've developed a custom JAR that I'm using to process data in Elastic MapReduce. The data is several hundred thousands files coming from Amazon S3. The JAR doesn't do anything terribly funky to read data - it's just using…
John Chrysostom
  • 3,973
  • 1
  • 34
  • 50
3
votes
1 answer

Elastic Search Nested Query with Nested Object

This is the type of data I have stored on my index in elastic search. I have to find Recipes with Main Ingredient Beef(and weight less than 1000) with Ingredients -(chilli powder and weight less than 250),(olive oil & weight less than 300 )and…
3
votes
1 answer

How to set instance role for EMR clusters launched via data pipeline?

I'm trying to attach an instance role to a cluster I'm running through data-pipeline. I'd like to run my own mapper script that needs write permissions to DynamoDB (the "regular" HIVE upload won't do the trick for me). I've gone through the API docs…
Zach Moshe
  • 2,782
  • 4
  • 24
  • 40
3
votes
0 answers

How to specify EMR cluster create CLI commands using AWS Java SDK?

Ok, this question is where I reached after trying out some stuff. I'll first give a brief intro to what I wanted to do and how I got here. I'm writing a script to start an EMR cluster using Java AWS SDK. The EMR cluster is to be started inside a VPC…
gaurav
  • 360
  • 3
  • 8
3
votes
2 answers

Elasticsearch plugin

I understand what is Elasticsearch, but have no clue on how to write a plugin for Elasticsearch. Can any one tell me the guidelines for writing plugins to Elasticsearch.
Vineel
  • 1,630
  • 5
  • 26
  • 47
3
votes
1 answer

Running MapReduce jobs on AWS-EMR from Eclipse

I have the WordCount MapReduce example in Eclipse. I exported it to Jar, and copied it to S3. I then ran it on AWS-EMR. Successfully. Then, I read this article -…
Quest Monger
  • 8,252
  • 11
  • 37
  • 43
3
votes
2 answers

Oozie on EMR - tasks hang forever in PREP state

I am running Oozie 4.0.1 on Elastic Mapreduce using the 3.0.4 AMI (Hadoop 2.2.0). I've built Oozie from source, and everything installs and seems to work correctly, up to the point of scheduling a Hive job. That is, I can connect to the Web…
mindcrime
  • 657
  • 8
  • 23
3
votes
1 answer

Is there an open source version of s3distcp?

I would love to use s3distcp for copying data from S3 buckets to S3 buckets but I have the need to use an external proprietary encryption mechanism to ensure the data is encrypted at rest (keeping the keys to myself so amazon could not decrypt) I…
kellyfj
  • 6,586
  • 12
  • 45
  • 66