I have a function which needs to be called on a lot of files (1000's). Each is independent of another, and can be run in parallel. The output of the function for each of the files does not need to be combined (currently) with the other ones. I have a lot of servers I can scale this on but I'm not sure what to do:
1) Run a MapReduce on it
2) Create 1000's of jobs (each has a different file it works on).
Would one solution be preferable to another?
Thanks!