Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

2 answers

Amazon Elastic MapReduce - SIGTERM

I have an EMR streaming job (Python) which normally works fine (e.g. 10 machines processing 200 inputs). However, when I run it against large data sets (12 machines processing a total of 6000 inputs, at about 20 seconds per input), after 2.5 hours…

python hadoop-streaming elastic-map-reduce amazon-emr

asked Aug 15 '12 at 13:59

slavi

votes

3 answers

Amazon Elastic Map Reduce for analyzing s3 logs

I am using EMR to analyze web nginx logs. But I need to process the logs so that it can fall into rows and columns in order to make it easy for querying. Thus i made two tables - rawlog, processedlog in the following manner: create table rawlog(line…

amazon-s3 amazon-web-services transform hive elastic-map-reduce

asked Jun 08 '12 at 10:21

princess of persia

2,222
4
26
43

votes

1 answer

Using Distributed Cache with Pig on Elastic Map Reduce

I am trying to run my Pig script (which uses UDFs) on Amazon's Elastic Map Reduce. I need to use some static files from within my UDFs. I do something like this in my UDF: public class MyUDF extends EvalFunc { public DataBag exec(Tuple…

hadoop apache-pig elastic-map-reduce

asked Nov 22 '11 at 12:19

Vivek Pandey

3,455
1
19
25

votes

1 answer

Hadoop seems to modify my key object during an iteration over values of a given reduce call

Hadoop Version: 0.20.2 (On Amazon EMR) Problem: I have a custom key that i write during map phase which i added below. During the reduce call, I do some simple aggregation on values for a given key. Issue I am facing is that during the iteration of…

hadoop reduce elastic-map-reduce

asked May 23 '11 at 02:55

Bhargava

votes

0 answers

Spark 2.2 write partitionBy out of memory exception

I think anyone that has used Spark has ran across OOM errors, and usually the source of the problem can be found easily. However, I am a bit perplexed by this one. Currently, I am trying to save by two different partitions, using the partitionBy…

scala hadoop apache-spark jvm elastic-map-reduce

asked Nov 19 '17 at 23:05

Derek_M

1,018
10
22

votes

1 answer

AWS EMR Cluster Streaming Step: Bad Request

I am trying to set up a trivial EMR job to perform word counting of massive text files, stored in s3://__mybucket__/input/. I am unable to correctly add the first of the two required streaming steps (the first is map input to wordSplitter.py, reduce…

amazon-web-services hadoop amazon-s3 elastic-map-reduce

asked Dec 17 '16 at 18:28

Skyler

2,834
5
22
34

votes

1 answer

How to map fields in Hive for DynamoDb Amazon Console export?

I am trying to load the DynamoDb export file which is taken from Amazon Dynamodb Web Console with "Import/Export" tool into Hive. But I couldn't map the fields properly because DynamoDB Web Console "Export" tool is using "ETX" "STX". Below is an…

hadoop hive amazon-dynamodb elastic-map-reduce amazon-emr

asked May 28 '15 at 14:24

Barbaros Alp

6,405
8
47
61

votes

1 answer

What is the best practice to monitor AWS EMR job running progress?

I have following code to run a EMR job, and it runs successfully. And I also want to monitor the running status. I use DescribeJobFlows API, but it says this API is deprecated according to…

java amazon-web-services emr elastic-map-reduce amazon-emr

asked Feb 06 '15 at 11:25

coderz

4,847
11
47
70

votes

1 answer

Possibility of taking snapshot of AWS EMR cluster or namenode

I am new with AWS services and trying some use-cases. I want to create EMR clusters on demand with some predefined configurations and applications/scripts installed. I was planning to create a snapshot of existing EMR cluster or at-least namenode…

amazon-web-services snapshot elastic-map-reduce

asked Nov 12 '14 at 12:20

shahsank3t

votes

3 answers

Spark/Hadoop throws exception for large LZO files

I'm running an EMR Spark job on some LZO-compressed log-files stored in S3. There are several logfiles stored in the same folder, e.g.: ... s3://mylogfiles/2014-08-11-00111.lzo s3://mylogfiles/2014-08-11-00112.lzo ... In the spark-shell I'm running…

hadoop apache-spark elastic-map-reduce lzo

asked Aug 11 '14 at 16:37

Pimin Konstantin Kefaloukos

1,560
3
14
31

votes

1 answer

Can BigQuery's browser interface be white-labeled?

Like most people, we're pretty impressed with BigQuery. We're willing to put up with it being based on proprietary "Dremel" in exchange for not having to configure a ton of servers in our LAN, on EC2, or anywhere else. The REST API is excellent,…

google-bigquery elastic-map-reduce cloudera-cdh

asked May 29 '14 at 21:19

pmueller

votes

1 answer

Trouble using hbase from java on Amazon EMR

So Im trying to query my hbase cluster on Amazon ec2 using a custom jar i launch as a MapReduce step. Im my jar (inside the map function) I call Hbase as so: public void map( Text key, BytesWritable value, Context contex ) throws IOException,…

hadoop amazon-web-services hbase apache-zookeeper elastic-map-reduce

asked Feb 28 '14 at 20:22

frankie liuzzi

1,680
12
13

votes

1 answer

create hive table from tab separated file in s3 using interactive mode

I've loaded tab separated files into S3 that with this type of folders under the bucket: bucket --> se --> y=2013 --> m=07 --> d=14 --> h=00 each subfolder has 1 file that represent on hour of my traffic. I then created an EMR workflow to run in…

amazon-web-services amazon-s3 hive elastic-map-reduce

asked Jul 14 '13 at 13:33

Gluz

3,154
5
24
35

votes

3 answers

Is it possible to run hadoop fs -getmerge in S3?

I have an Elastic Map Reduce job which is writing some files in S3 and I want to concatenate all the files to produce a unique text file. Currently I'm manually copying the folder with all the files to our HDFS (hadoop fs copyFromLocal), then I'm…

hadoop amazon-s3 elastic-map-reduce amazon-emr

asked Jun 29 '12 at 11:21

yeforriak

1,705
2
18
26

votes

4 answers

How do you use Python UDFs with Pig in Elastic MapReduce?

I really want to take advantage of Python UDFs in Pig on our AWS Elastic MapReduce cluster, but I can't quite get things to work properly. No matter what I try, my pig job fails with the following exception being logged: ERROR 2998: Unhandled…

apache-pig elastic-map-reduce

asked Feb 15 '12 at 20:06

Chris Phillips

11,607
3
34
45

Prev 1 2 3

…

30 31 Next