Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
0
votes
1 answer

Multiple outputs suddenly not writing any output?

Suddenly multiple outputs not writing any output to the destination. I use a custom implementation of multiple outputs, where, I just changed: if((ch == '/') || (ch == ':')||(ch == '-')||(ch =='.')) { continue; } in the…
0
votes
1 answer

Reducer node takes a long time to receive its records

When I checked the Hadoop GUI, I found that some of the reduce tasks have reached 66.66%, and they stay there for a long time. When I checked the counters, I found that the no. of input records is shown as zero. After a long time, they get their…
0
votes
2 answers

Best technology stack for aggregation across various properties

We are working on developing a platform which models flow of entities across a graph. The system has to answer questions of the kind how many entities having these properties are sitting at a given node on the graph , what is the inflow on a node,…
0
votes
1 answer

Amazon EMR: "no output" found in S3

I am not getting any output in S3 when I run a job in Amazon EMR. I specified the arguments: -inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/output When I checked the job log, I see that the job has completed successfully. But…
0
votes
1 answer

AWS EMR CLI - Pass Arguments to HIVE

I am using aws' emr ruby cli to generate Hadoop clusters and I am trying to include arguments to use within a HIVE script hosted elsewhere like so: ./elastic-mapreduce --create ... --args -d,DT=2013-01-26 'DT' shows up satisfactorily in my…
0
votes
1 answer

Make files available locally on Elastic MapReduce

The Hadoop documentation states it's possible to make files available locally by use of the -file option. How can I do this using the Elastic MapReduce Ruby CLI?
Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
0
votes
1 answer

AWS Elastic MapReduce Streaming. Use data from nested folders as input

I have data located in structure s3n://bucket/{date}/{file}.gz with > 100 folders. How to setup streaming job and use all of them as input? Specifying s3n://bucket/ didn't help since nodes are folders.
varela
  • 1,281
  • 1
  • 10
  • 16
0
votes
2 answers

Downloading files from FTP to local using Java makes the file unreadable - encoding issues

I have a developed a code that reads very large files from FTP and writes it to local machine using Java. The code that does it is as follows . This is a part from the next(Text key, Text value) inside the RecordReader of the CustomInputFormat …
RadAl
  • 404
  • 5
  • 23
0
votes
1 answer

Hive - map not sending parameters to custom map script?

I'm trying to use the map clause with Hive but I'm tripping over syntax and not finding many examples of my use case around. I used the map clause before when I had to process one of the columns of a table using an external script. I had a python…
Rafael S. Calsaverini
  • 13,582
  • 19
  • 75
  • 132
0
votes
1 answer

Hadoop Pig save each line of a file to S3

Currently, I have Pig script running on top of Amazon EMR to load a bunch of files from S3 and then I will do the filter processing and group the data into phone number, so the data will be like (phonenumber:chararray, bag:{mydata:chararray}). Next…
Simon Guo
  • 2,776
  • 4
  • 26
  • 35
0
votes
1 answer

Running Elastic Mapreduce Hive Queries from an Application

I've run Hive on elastic mapreduce in interactive mode: ./elastic-mapreduce --create --hive-interactive and in script mode: ./elastic-mapreduce --create --hive-script --arg s3://mybucket/myfile.q I'd like to have an application (preferably in PHP,…
dubois
  • 211
  • 2
  • 4
  • 10
0
votes
1 answer

"The location specified by MRJOB_CONF" in mrjob documentation

Which path is "The location specified by MRJOB_CONF" in mrjob documentation? Link to mrjob doc: http://mrjob.readthedocs.org/en/latest/guides/configs-basics.html
user1403483
0
votes
3 answers

Reading in a parameter file in Amazon Elastic MapReduce and S3

I am trying to run my hadoop program in Amazon Elastic MapReduce system. My program takes an input file from the local filesystem which contains parameters needed for the program to run. However, since the file is normally read from the local…
Ahmadov
  • 1,567
  • 5
  • 31
  • 48
0
votes
1 answer

Some elementary doubts about running Mapreduce programs using mrjob on Amazon EMR

I am new to mrjob and I am having problems to get the job running on Amazon EMR. I will write them in sequential order. I can run a mrjob on my local machine. However when I have mrjob.conf in /home/ankit/.mrjob.conf and in /etc/mrjob.conf, the job…
user1403483
0
votes
2 answers

What are the different ways in which we can make a MapReduce program read the data?

Actually, I want to perform a computation on a CSV file and for each row of that CSV file, I want to also use the previous four rows for the computation. How can I do that? Almost all the MapReduce examples I have read, the only way data was read…
user1403483