I have a very specific problem in Hadoop.
I have two files userlist and *raw_data*. Now raw_data is a pretty big file and userlist is a comparatively smaller than the other file.
I have to first identify the number of mappers and my userlist has to broken down to pieces equal to the number of mappers. Later it has to be loaded into distributed cache and it has to compare with userlist and perform some analytics and write it to reducer.
Please suggest.
thank you.