Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

Multiple outputs suddenly not writing any output?

Suddenly multiple outputs not writing any output to the destination. I use a custom implementation of multiple outputs, where, I just changed: if((ch == '/') || (ch == ':')||(ch == '-')||(ch =='.')) { continue; } in the…

java hadoop mapreduce elastic-map-reduce

asked Mar 17 '13 at 14:51

Mahalakshmi Lakshminarayanan

votes

1 answer

Reducer node takes a long time to receive its records

When I checked the Hadoop GUI, I found that some of the reduce tasks have reached 66.66%, and they stay there for a long time. When I checked the counters, I found that the no. of input records is shown as zero. After a long time, they get their…

java hadoop mapreduce distributed-computing elastic-map-reduce

asked Mar 16 '13 at 15:59

Mahalakshmi Lakshminarayanan

votes

2 answers

Best technology stack for aggregation across various properties

We are working on developing a platform which models flow of entities across a graph. The system has to answer questions of the kind how many entities having these properties are sitting at a given node on the graph , what is the inflow on a node,…

hadoop amazon-web-services amazon-s3 amazon-redshift elastic-map-reduce

asked Mar 13 '13 at 13:24

Swapnil

votes

1 answer

Amazon EMR: "no output" found in S3

I am not getting any output in S3 when I run a job in Amazon EMR. I specified the arguments: -inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/output When I checked the job log, I see that the job has completed successfully. But…

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked Feb 09 '13 at 16:12

Mahalakshmi Lakshminarayanan

votes

1 answer

AWS EMR CLI - Pass Arguments to HIVE

I am using aws' emr ruby cli to generate Hadoop clusters and I am trying to include arguments to use within a HIVE script hosted elsewhere like so: ./elastic-mapreduce --create ... --args -d,DT=2013-01-26 'DT' shows up satisfactorily in my…

hadoop amazon-web-services arguments hive elastic-map-reduce

asked Jan 30 '13 at 21:53

user1101431

votes

1 answer

Make files available locally on Elastic MapReduce

The Hadoop documentation states it's possible to make files available locally by use of the -file option. How can I do this using the Elastic MapReduce Ruby CLI?

ruby hadoop elastic-map-reduce

asked Jan 17 '13 at 01:21

Matt Joiner

112,946
110
377
526

votes

1 answer

AWS Elastic MapReduce Streaming. Use data from nested folders as input

I have data located in structure s3n://bucket/{date}/{file}.gz with > 100 folders. How to setup streaming job and use all of them as input? Specifying s3n://bucket/ didn't help since nodes are folders.

hadoop-streaming elastic-map-reduce

asked Jan 16 '13 at 15:51

varela

1,281
1
10
16

votes

2 answers

Downloading files from FTP to local using Java makes the file unreadable - encoding issues

I have a developed a code that reads very large files from FTP and writes it to local machine using Java. The code that does it is as follows . This is a part from the next(Text key, Text value) inside the RecordReader of the CustomInputFormat …

java hadoop ftp elastic-map-reduce amazon-emr

asked Jan 02 '13 at 06:23

RadAl

votes

1 answer

Hive - map not sending parameters to custom map script?

I'm trying to use the map clause with Hive but I'm tripping over syntax and not finding many examples of my use case around. I used the map clause before when I had to process one of the columns of a table using an external script. I had a python…

hadoop hive elastic-map-reduce

asked Dec 29 '12 at 17:13

Rafael S. Calsaverini

13,582
19
75
132

votes

1 answer

Hadoop Pig save each line of a file to S3

Currently, I have Pig script running on top of Amazon EMR to load a bunch of files from S3 and then I will do the filter processing and group the data into phone number, so the data will be like (phonenumber:chararray, bag:{mydata:chararray}). Next…

hadoop amazon-s3 apache-pig elastic-map-reduce amazon-emr

asked Dec 28 '12 at 06:32

Simon Guo

2,776
4
26
35

votes

1 answer

Running Elastic Mapreduce Hive Queries from an Application

I've run Hive on elastic mapreduce in interactive mode: ./elastic-mapreduce --create --hive-interactive and in script mode: ./elastic-mapreduce --create --hive-script --arg s3://mybucket/myfile.q I'd like to have an application (preferably in PHP,…

api hive elastic-map-reduce

asked Dec 21 '12 at 04:31

dubois

votes

1 answer

"The location specified by MRJOB_CONF" in mrjob documentation

Which path is "The location specified by MRJOB_CONF" in mrjob documentation? Link to mrjob doc: http://mrjob.readthedocs.org/en/latest/guides/configs-basics.html

hadoop mapreduce hadoop-streaming elastic-map-reduce mrjob

asked Dec 15 '12 at 09:07

user1403483

votes

3 answers

Reading in a parameter file in Amazon Elastic MapReduce and S3

I am trying to run my hadoop program in Amazon Elastic MapReduce system. My program takes an input file from the local filesystem which contains parameters needed for the program to run. However, since the file is normally read from the local…

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked Dec 14 '12 at 08:42

Ahmadov

1,567
5
31
48

votes

1 answer

Some elementary doubts about running Mapreduce programs using mrjob on Amazon EMR

I am new to mrjob and I am having problems to get the job running on Amazon EMR. I will write them in sequential order. I can run a mrjob on my local machine. However when I have mrjob.conf in /home/ankit/.mrjob.conf and in /etc/mrjob.conf, the job…

python hadoop mapreduce elastic-map-reduce mrjob

asked Dec 12 '12 at 08:03

user1403483

votes

2 answers

What are the different ways in which we can make a MapReduce program read the data?

Actually, I want to perform a computation on a CSV file and for each row of that CSV file, I want to also use the previous four rows for the computation. How can I do that? Almost all the MapReduce examples I have read, the only way data was read…

hadoop mapreduce elastic-map-reduce

asked Dec 12 '12 at 06:32

user1403483

Prev 1 2 3

…

30 31 Next