I have several files with datas in it.
For example: file01.csv
with x
lignes in it, file02.csv
with y
lines in it.
I would like to treat and merge them with mapreduce in order to get a file with the x
lines beginning with file01
then line content, and y
files beginning with file02
then line content.
I have two issues here:
- I know how to get lines from a file with mapreduce by setting
FileInputFormat.setInputPath(job, new Path(inputFile));
But I don't understand how I can get lines of each file of a folder. - Once I have those lines in my mapper, how can I access to the filename corresponding, so that I can create the data I want ?
Thank you for your consideration.
Ambre