Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
0
votes
2 answers

Exception in thread "main" org.elasticsearch.client.transport.NoNodeAvailableException: No node available

I am trying index using below Java code in elastic search.. I gave my machine Ip in the code .It is unable to connect to node. It is giving error like below : Exception in thread "main" org.elasticsearch.client.transport.NoNodeAvailableException:…
Amaresh
  • 3,231
  • 7
  • 37
  • 60
0
votes
1 answer

Running Mappers and Reducers on different Groups of machines

We have a nice, big, complicated elastic-mapreduce job that has wildly different constraints on hardware for the Mapper vs Collector vs Reducer. The issue is: for the Mappers, we need tonnes of lightweight machines to run several mappers in…
0
votes
2 answers

BZip2 Native Splitting on Amazon/EMR

We have a question in specific regard to compressed input on an Amazon EMR Hadoop job. According to AWS: "Hadoop checks the file extension to detect compressed files. The compression types supported by Hadoop are: gzip, bzip2, and LZO. You do not…
David Beveridge
  • 560
  • 1
  • 6
  • 17
0
votes
0 answers

Custom Grouping and Partitioning in Job Conf

AWS Job not accepting the configuration parameters for Custom Grouping and Custom Sorting. conf3.setOutputValueGroupingComparator(StockKeyGroupingComparator.class); conf3.setOutputKeyComparatorClass(StockKeySortComparator.class); I run the jar from…
0
votes
1 answer

setting ssh permission in hadoop installation

I'm trying to install hadoop for the first time and I'm following this tutorial http://www.youtube.com/watch?v=xrxQXfE7t9A & https://sites.google.com/site/howtohadoop/how-to-install-hdp#bmec2 What I'm trying to do is setting up the master node to…
Dhoha
  • 369
  • 3
  • 6
  • 17
0
votes
0 answers

How do I convert my Java Hadoop code to run on EC2?

I wrote a Driver, Mapper, and Reducer class in Java that runs the k-nearest neighbor algorithm on test data, and pulls in the training set using Distributed Cache. I used a Cloudera virtual machine to test the code, and it works in…
user1956609
  • 2,132
  • 5
  • 27
  • 43
0
votes
1 answer

Class not found exception in eclipse wordcount program

I am running a word count program from eclipse, it says class not found. I exported same program as jar file and executed from command line, it's working fine. Here is the error stack trace 14/02/14 23:46:16 WARN mapred.JobClient: Use…
Venu
  • 303
  • 6
  • 21
0
votes
1 answer

outputing custom csv header in reducer of map reduce

I am creating my own reducer as follows: public class MyReducer implemts Reducer{ @override public void configure(JobConf conf){ } @override public void close(JobConf conf){ } public void reduce(parsms ){ } } } How can…
user93796
  • 18,749
  • 31
  • 94
  • 150
0
votes
2 answers

Unable to read Hadoop Sequence files through stdin with a streaming python map-reduce on AWS

I am trying to run a simple word counting map-reduce job on Amazon's Elastic Map Reduce but the output is gibberish. The input file is part of the common crawl files which are hadoop sequence files. The file is supposed to be the extracted text…
0
votes
1 answer

Tomcat 7 error about LeaseException querying an EMR cluster

I am getting below error while connecting to the EMR cluster using Tomcat server and Hbase as the database. I have made the changes suggested at http://www.nosql.se/2012/05/hbase-scanner-leaseexception/ and I have also rebooted the clusters…
CtrlV
  • 115
  • 11
0
votes
1 answer

Error in executing Customised WordCount jar in AWS EMR

Hi I am trying to execute customised WordCount jar on AWs EMR. My word count jar is working properly because I tried adding it as a step without job arguments and it is running successfully. My problem is when I run it with job arguments. In my s3 I…
sa_nyc
  • 971
  • 1
  • 13
  • 23
0
votes
2 answers

Mapper and Reducer in Hadoop

I have a confusion about the implementation of Hadoop. I notice that when I run my Hadoop MapReduce job with multiple mappers and reducers, I would get many part-xxxxx files. Meanwhile, it is true that a key only appears in one of them. Thus, I am…
Zz'Rot
  • 824
  • 1
  • 7
  • 24
0
votes
1 answer

hadoop 2.2.0 libraries are missing?

I've downloaded hadoop2.2.0.jar and added it to my eclipse project as an external jar library. I get error for: import org.apache.hadoop.fs.Path; import org.apache.hadoop.io; Error: org.apache.hadoop.fs.Path cannot be resolved. Could you please let…
Shane
  • 128
  • 1
  • 3
  • 15
0
votes
2 answers

Skipping bad input files in hadoop

I'm using Amazon Elastic MapReduce to process some log files uploaded to S3. The log files are uploaded daily from servers using S3, but it seems that a some get corrupted during the transfer. This results in a java.io.IOException: IO error in map…
Adrian Mester
  • 2,523
  • 1
  • 19
  • 23
0
votes
1 answer

How to save a file to ./ssh (mac osx)?

How do I save a file to the ./ssh directory (I am using mac osx). What should I enter at the command line or how else can save a downloaded file to ./ssh? (For more context, I am using Amazon MapReduce and wish to save the EMR.pem file to ssh.)
user2896468
  • 711
  • 2
  • 7
  • 7