Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Amazon Elastic Map Reduce Hadoop Jobs

Im new to Amazon Web Services and Map Reduce staff. My basic problem is I am trying to make an academic project were basically I am processing a large bunch of images and I need to detect a particular object in them. After I need a Map filled by…

hadoop amazon-s3 elastic-map-reduce amazon-emr

asked Oct 15 '14 at 17:32

Andrea Schembri

votes

1 answer

Run Pig with Lipstick on AWS EMR

I'm running an AWS EMR Pig job using script-runner.jar as described here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-script.html Now, I want to hook up Netflix' Lipstick to monitor my scripts. I set up the server,…

hadoop amazon-web-services apache-pig elastic-map-reduce netflix

asked Oct 07 '14 at 07:37

Tim

2,008
16
22

votes

1 answer

Is there a way to launch EMR jobs on AWS Virtual Private Cloud.

Is there a way to launch EMR jobs on AWS Virtual Private Cloud. I am planning to launch my AWS Simple workflow which will boot cluster and Add jobs to the clusetr using AWS VPC for some security reason.

hadoop amazon-web-services elastic-map-reduce amazon-vpc

asked Sep 16 '14 at 20:16

user3335406

votes

1 answer

Mapreduce output showing all records in same line

I have implemented a mapreduce operation for log file using amazon and hadoop with custom jar. My output shows the correct keys and values, but all the records are being displayed in a single line. For example, given the following pairs: <1387,…

elastic-map-reduce

asked Sep 11 '14 at 16:24

Deepa Sadagopan

votes

1 answer

LeaseExpiredException with custom UDF in Hive

I have a Hive UDF which is supposed to extract the device from an UA string. It uses the ua-parser library: https://github.com/tobie/ua-parser The UDF is rather simple: public class DeviceTypeExtractTest extends UDF{ private Text result = new…

hadoop hive elastic-map-reduce emr

asked Sep 11 '14 at 11:26

Ana Todor

votes

1 answer

Force Hive to throw an error on an empty Table

I am using AWS EMR clusters to run Hive. I want to be able to enforce that certain tables should never be empty After initial creation, such as refrence tables, and if they are found to be empty to throw an error (or log a message) and stop…

amazon-web-services hive hiveql elastic-map-reduce

asked Sep 05 '14 at 17:13

cbradsh1

votes

1 answer

Process entire files using Hadoop streaming on Amazon EMR

I have a directory full of gzipped text files on Amazon S3, and I'm trying to use Hadoop streaming on Amazon Elastic MapReduce to apply a function to each file individually (specifically, parse a multi-line header). The default Hadoop streaming…

hadoop amazon-web-services amazon-s3 hadoop-streaming elastic-map-reduce

asked Aug 08 '14 at 20:39

user3923714

votes

1 answer

Unable to load Hive-JDBC driver when accessed through MapReduce program on Amazon's Elastic MapReduce

I have written a MapReduce program in which I am storing some part of output data into Hive table. I have used Hive-JDBC driver to access Hive table via MapReduce code. This program has compiled successfully on local machine. After this, I created…

hadoop jdbc mapreduce hive elastic-map-reduce

asked Jul 16 '14 at 11:33

user3523860

votes

1 answer

Issue with using files in distributed cache in Elastic MapReduce

I'm trying to make use of an external library in my Python mapper script in an AWS Elastic MapReduce job. However, my script doesn't seem to be able to find the modules in the cache. I archived the files into a tarball called helper_classes.tar and…

python hadoop amazon-web-services elastic-map-reduce

asked Jul 10 '14 at 05:09

user296554

votes

1 answer

R Reducer is not working properly in Amazon EMR

I have done a map reduce code in R to run in Amazon EMR. My input file format: URL1 word1 word2 word3 URL2 word4 word2 word3 URL3 word1 word7 word2 I'm expecting the output as: URLs are concat with spaces word1 URL1 URL3 word2 URL1 URL2…

r hadoop mapreduce elastic-map-reduce emr

asked Jun 26 '14 at 03:26

Nadaraj

votes

1 answer

Map Error- Attempy_xxxx_ Timed out after 600 seconds

I'm using Hadoop 2.2.0 and in when I run my map tasks I get the following error attempt_xxx Timed out after 1800000 seconds (its 1800000 because I have changed the config for mapreduce.task.timeout). Below is my map code: public class MapTask { …

hadoop dictionary mapreduce timeout elastic-map-reduce

asked Jun 17 '14 at 08:03

user3690321

votes

1 answer

"Invalid option" error when passing arguments to EMR Bootstrap Action

I'm programatically provisioning an EMR cluster using the Java SDK, and am trying to pass arguments to the setup-impala script. The code I have looks like this: ... List bootstrapActions = new…

java ruby amazon-web-services elastic-map-reduce

asked Jun 16 '14 at 20:26

mindcrime

votes

1 answer

mmh3 not installed on Elastic MapReduce in AWS

I need to use mmh3 for hashing. However, when I run "python MultiwayJoin.py R.csv S.csv T.csv -r emr > output.txt" in terminal, it returned an error said that: File "MultiwayJoin.py", line 5, in import mmh3 ImportError: No module named mmh3

python amazon-web-services elastic-map-reduce

asked Jun 02 '14 at 18:55

user3390265

votes

3 answers

How is data partitioned and distributed among datanodes in MapReduce?

I'm new to MapReduce, I'm having the task to process large data(lines of records). One thing I should use is the line number of specific record in my mapper, and then reducer process the line number information based on the mapper. For instance,…

python hadoop mapreduce elastic-map-reduce

asked May 22 '14 at 02:24

i3wangyi

2,279
3
15
12

votes

1 answer

bulk indexing in elasticsearch issue

I am trying to index a file by using below code: But I am wondering why it is not happening: Could any body explain the reason for not indexing. public static void main(String[] args) throws IOException { String line; List l=new…

elasticsearch elastic-map-reduce

asked Apr 24 '14 at 10:40

Amaresh

3,231
7
37
60

Prev 1 2 3

…

30 31 Next