I have just started learning Mapreduce and have some queries I want answers to. Here goes:
1) Case 1: FileInputFormat as Input format. A directory having multiple files to be processed is the Input Path. If I have n files, all of the files lesser than the block size in the hadoop cluster, How many splits are calculated for the map reduce Job?
2) I extend FileInputFormat in a class called MyFileInputFormat, and I override isSplitable to always return false. The input configuration is same as above. Will I get n splits in this case?
3) If say 1 of the files among the n files is slightly larger than the cluster's block size will I get n+1 splits in the second case?
Thanks in advance for the help!