0

Actually, I want to perform a computation on a CSV file and for each row of that CSV file, I want to also use the previous four rows for the computation. How can I do that? Almost all the MapReduce examples I have read, the only way data was read was one row at a time and even the computations on different lines were independent of each other. Any resources and good pointers will be appreciated.

darcyy
  • 5,236
  • 5
  • 28
  • 41
  • This might also help: http://stackoverflow.com/questions/2711118/multiple-lines-of-text-to-a-single-map – Amar Dec 13 '12 at 20:42

2 Answers2

0

The way records are splitted depends on the RecordReader being used. The default RecordReader is LineRecordReader, so your records are actually lines. If you want your data to come in chunks of 4 lines, try to implement your own RecordReader that divides data into groups of 4 lines.

http://developer.yahoo.com/hadoop/tutorial/module4.html

Diego Pino
  • 11,278
  • 1
  • 55
  • 57
0

The way you do it is to override InputFormat and RecordReader.

You can search the web for MultipleLineTextRecordReader.java MultipleLineTextInputFormat.java WholeFileTextInputFormat.java to get started.

Ahmadov
  • 1,567
  • 5
  • 31
  • 48