0

I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain text (5MB).

I read InputFormat (method getSplits), FileInputFormat (FileSplit method) and RecordReader but couldn't understand how to generate and use splits to create required custom (K,V) for my mappers. I am new to Hadoop MapReduce programming so please suggest me how to proceed in this case.

Community
  • 1
  • 1
Sumit
  • 27
  • 8
  • What do you mean by complete text data without space? Could you provide a little example of it? You need a logic to create pairs for mappers. For example, the logic in wordcount example is to split the text data with spaces. – Mobin Ranjbar Feb 18 '16 at 09:44
  • My data is a large file containing character sequece like sdaccraggrralwghdsgfndsnvfcvnd..... in MB's. so i want to partition this data to apply my procesisng at each mapper side. and i want to identify which partitionis given mapper is having – Sumit Feb 18 '16 at 11:06
  • So, you want to split this string, by what? What kind of logic? Maybe you can use SubString but what is the length of your interest? – Mobin Ranjbar Feb 18 '16 at 11:20
  • nothing i just want to split this data in equal size partition i.e charecter sequence of equal length to each mapper (say 5mb to each mapper if my input size is 50mb and im interested to use 10 mappers ) – Sumit Feb 18 '16 at 12:36

1 Answers1

0

You can set mapreduce.input.fileinputformat.split.maxsize in your configuration in bytes to tell the mapper you should get 5MB of data.

Mobin Ranjbar
  • 1,320
  • 1
  • 14
  • 24