Questions tagged [emr]

Questions relating to Amazon's Elastic MapReduce (EMR) product.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

1166 questions
0
votes
1 answer

Automatic Hive or Cascading for ETL in AWS-EMR

I have a large dataset residing in AWS S3. This data is typically a transactional data (like calling records). I run a sequence of Hive queries to continuously run aggregate and filtering condtions to produce a couple of final compact files (csvs…
prog_guy
  • 796
  • 3
  • 7
  • 24
0
votes
1 answer

Adding TAG to EMR Cluster

I am launching emr cluster using Java API but not able to associate a tag to it. Pl can you help me on this. Using EMR CLI, it is very easy as below but I have to do this using my Java code ./elastic-mapreduce --create --alive --tag…
0
votes
2 answers

Invalid ssh key running mrjob script on emr

I'm going through this guide on how to get mrjob working on EMR. I follow all the steps, but when I run the example script I get this error: matthew@WinterMute:~/work/projects/mrjob_examples$ python word_count.py -r emr moby.txt using configs in…
mdornfe1
  • 1,982
  • 1
  • 24
  • 42
0
votes
2 answers

In Amazon EMR, what is the relationship between a core instance, a mapper, and a map slot?

I am confused about the relationship between core instances and mappers each instance can have. How are these mappers created? If I set core instance count to 0, so that only master node is running, why can MapReduce jobs run without any task…
user2764080
  • 1
  • 1
  • 4
0
votes
1 answer

What happens when a mapper dies in EMR streaming job?

In a elastic mapreduce streaming job, what is going to happen if a mapper suddenly dies? The data that were already processed will be replayed? If so, is there any option to disable that? I am asking because I am using EMR to insert some data to…
Vame
  • 2,033
  • 2
  • 18
  • 29
0
votes
1 answer

Running Custom JAR on Amazon EMR giving error ( Filesystem Error ) using Amazon S3 Bucket input and output

I am trying to run a Custom JAR on Amazon EMR cluster using the input and output parameters of the Custom JAR as S3 buckets (-input s3n://s3_bucket_name/ldas/in -output s3n://s3_bucket_name/ldas/out) When the cluster runs this Custom JAR, the…
ilam
  • 31
  • 4
0
votes
1 answer

How can I get and process a new S3 file for every iteration of an mrjob mapper?

I have a log file of status_changes, each one of which has a driver_id, timestamp, and duration. Using driver_id and timestamp, I want to fetch the appropriate GPS log from S3. These GPS logs are stored in an S3 bucket in the form…
numbers are fun
  • 423
  • 1
  • 7
  • 12
0
votes
2 answers

Tomcat not accessible on Amazon EMR

I created amazon EMR cluster with one master and one slave. I installed Tomcat on my master instance. I replaced all "8080" with "8686" and "localhost" with "0.0.0.0" in /conf/server.xml. I started tomcat instance and can see below output of command…
ajit
  • 25
  • 1
  • 6
0
votes
2 answers

WebHCat on Amazon's EMR?

Is it possible or advisable to run WebHCat on an Amazon Elastic MapReduce cluster? I'm new to this technology and I was wonder if it was possible to use WebHCat as a REST interface to run Hive queries. The cluster in question is running Hive.
James McMahon
  • 48,506
  • 64
  • 207
  • 283
0
votes
2 answers

Error while copying external jars to /home/hadoop/lib folder EMR Amazon

I am copying my external jars to /home/hadoop/lib directoy in EMR as a bootstrap process. But it is showing following error during bootstrap process Exception in thread "main" java.lang.NoSuchMethodError:…
neel
  • 8,399
  • 7
  • 36
  • 50
0
votes
1 answer

RedshiftStorage for Pig jobs on EMR?

I would like to be able to store results from a Pig workflow (running from EMR) directly into Amazon Redshift. Has anyone done this yet?
Evan Zamir
  • 8,059
  • 14
  • 56
  • 83
0
votes
2 answers

Send mail from EC2 or EMR on AWS

Is there any way to Send mails with Reports attached from EMR? I am using Amazon Web Services. I don't want to write a script inside EC2 to fetch data from EMR, add it on cron, then send the mails daily. Any luck, there is already any Jobs Scheduler…
Mayukh Roy
  • 1,815
  • 3
  • 19
  • 31
0
votes
1 answer

is it possibly to load data from DynamoDB directly from EMR pig?

I know you can load data directly from DynamoDB in EMR hive, but what about EMR pig? Is there a way to load data from DynamoDB dirctly in pig? Without first saving it to hdfs? Thanks.
Yves Dorfsman
  • 2,684
  • 3
  • 20
  • 28
0
votes
1 answer

EMR, EC2, OpenStack, Please clarify

I am quite new to Amazon services, and started reading about EMR. I am more or less familiar with OpenStack. I just want some one to tell me in short what plays the role of Compute, Controller and Cinder of storage in Amazon cloud. For example…
user3237842
  • 93
  • 1
  • 1
  • 5
0
votes
0 answers

ClassNotFoundException for hadoop custom input format in amzon EMR

Getting ClassNotFoundException when I use the custom input format for mapreduce in hadopp using amazon EMR. The json.org dependency is present in maven pom.xml and still it throws error: java.lang.ClassNotFoundException: org.json.JSONException at…
CodeRocker
  • 67
  • 2
  • 3
  • 9