I want to know if there is a well-known algorithm for extrapolating a filename pattern, given a collection of sample filenames as input. Take the following example filenames:
ABC_348093423.csv
i.ABC_348097340.csv
ABC_348099322.csv
i.GHI_348099324.csv
p.ABC_348101632.csv
DEF_348101736.csv
p.ABC_348101633.csv
ABC_348102548.csv
Ideally, the patterns that I would want to end up with in the result set would be something like:
*.ABC_*.csv
*.DEF_*.csv
*.GHI_*.csv
Even result values like the following would still be a good starting point:
i.ABC_348*.csv
p.ABC_348*.csv
...
Why do I need this?
I have an existing application where users can input a "file mask" to define a bucket for incoming input files to be grouped into. Incoming files are evaluated against each file mask (in order), and if the file matches a mask, the file goes into the bucket for that file mask... the end.
What I'd like to implement is, given the last X filenames that were processed, present the user with suggestions for new file masks. It does not have to be perfect. This will just be a user-assist feature.
What language am I using?
My application is written in Java, so any third-party Java library that can perform this kind of function would be an ideal solution. Otherwise, if there is a well-known algorithm for this problem, then I could implement it myself.