All,
I am working on creating an interface for dealing with some massive data and generating arff files for doing some machine learning stuff with. I can currently collect the features- but I have no way of associating them with the files they were derived from. I am currently using Dumbo
def mapper(key, value):
#do stuff to generate features
Is there any convenient method for determining the filename that was opened and had its contents passed to the mapper function?
Thanks again. -Sam