What are the different ways in which we can make a MapReduce program read the data?

Question

Actually, I want to perform a computation on a CSV file and for each row of that CSV file, I want to also use the previous four rows for the computation. How can I do that? Almost all the MapReduce examples I have read, the only way data was read was one row at a time and even the computations on different lines were independent of each other. Any resources and good pointers will be appreciated.

This might also help: http://stackoverflow.com/questions/2711118/multiple-lines-of-text-to-a-single-map — Amar, Dec 13 '12 at 20:42

score 0 · Answer 1 · answered Dec 12 '12 at 16:39

The way records are splitted depends on the RecordReader being used. The default RecordReader is LineRecordReader, so your records are actually lines. If you want your data to come in chunks of 4 lines, try to implement your own RecordReader that divides data into groups of 4 lines.

http://developer.yahoo.com/hadoop/tutorial/module4.html

score 0 · Answer 2 · answered Dec 14 '12 at 10:19

0

The way you do it is to override InputFormat and RecordReader.

You can search the web for MultipleLineTextRecordReader.java MultipleLineTextInputFormat.java WholeFileTextInputFormat.java to get started.

answered Dec 14 '12 at 10:19

Ahmadov

1,567
5
31
48

What are the different ways in which we can make a MapReduce program read the data?

2 Answers2