I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain text (5MB).
I read InputFormat (method getSplits)
, FileInputFormat (FileSplit method)
and RecordReader
but couldn't understand how to generate and use splits to create required custom (K,V) for my mappers. I am new to Hadoop MapReduce
programming so please suggest me how to proceed in this case.