I am trying to write a map reduce job in python.The first mapper will be splitting the files into multiple subfiles and the reducer will do some manupulation on the same files and combine it How do I write to split the files randomly in python in first map reduce and moreover I was thinking of using os module and split commmand to split it ,but my confusion is if i split it in suppose 30 parts ,how do I ensure the 30 part will be processed in a same way ,or is it the case that hadoop ensures the concurrency ?
for better understanding of my confusion : suppose I split the file in k parts in map job ,what information do I need to pass to he reduce job to make it operate on each split file