We have a system that generates literally 5k - 10k XML files each day. It's a long story but that system will not change for a while.
Anyway, the system dumps these XML files (3k-20k each) into ONE folder. So imagine how quickly that folder starts getting swamped.
I wrote a program that takes the files and organizes them into a hierarchy of year/month/day format. Then another program goes in and deletes any file older than 90 days.
Here's the problem. Our backup system (again, something that can not be changed) takes HOURS to backup those archived folders because there are close to 1 million files. The backup does a full backup (again, we can not change) and what is happening is that the backup has to open and inspect EACH XML file. So the backup speed is so slow that it actually does not finish before the next night's backup!
So what I've been doing is now taking monthly folders and creating a 7z archive. This works well. 200k files down to one file. However, I have to manually do this.
Also, there is one other catch. We can not archive the current month. So there always needs to be 30 days (x 5k - 10k) of files instantly "searchable".
Any suggestions on how to better handle this?
The following ran through my head:
1) Write a program to take the previous day and dump to SQL. Then add the file to an archive.
2) Some type of "live-archiving" file system that the XML could be moved to.
Thanks.