In my database I have multiple tables where each table is a different entity type. I have an Avro schema that I use in hadoop which is a union of all the fields of these different entity types plus it has a entity type field.
What I would like to do is something along the lines of setting up a DBInputFormat with a DBWritable for each entity type that maps the entity type to the combined Avro type. Then give each DBInputFormat to something like MultipleInputs so that I can create a composite input format. The composite input format could then be given to my map reduce job so that all of the data from all the tables could be processed at once by the same mapper class.
Data is constantly added to these database tables so I need to be able to configure the DBInputFormat for each entity type/dbtable to only grab the new data and to do the splits properly.
Basically I need the functionality of DBInputFormat or DataDrivenDBInputFormat but also the ability to make a composite of them similar to what you can do with paths and MultipleInputs.