I am wondering if anyone has any ideas to the correct approach and suitable algorithms for the below scenario:
There a thousands of distinct documents each with their own categorical encoding. These documents arrive into the system and need to be manually filed by the user into the correct folder. E.g.
Document Code | Folder |
---|---|
ABC123 | Folder 1 |
DEF456 | Folder 2 |
GHI789 | Folder 1 |
While we could create a mapping of document codes to the folder, this may be very cumbersome for so many codes that also may expand too. Furthermore, each customer may want to file the same type of document to different folder.
Is there a good approach to build a supervised model that would essentially learn which folder a specific document tends to get filed under using weighting from historical manual filing, then decide to file this automatically for the user in future?
I understand this weighting may difficult for a new document type that would need to be manually filed the first time and therefore be highly biased on the first occasion. But may be easier than building a classifier for the contents of the document that would ignore the code itself.
If anyone can point out some algorithms would be much appreciated!