We have one java application that polls files from client FTPs at every 30 minutes interval and then do a scan of all the files and see what all files are matching with the patterns configured inside the application and based on that process the files accordingly. The problem here is that we have to do linear scan at every 30 minutes and this is taking too much time. Since we do not want to process duplicate files so we maintain hashcode of file at our end and then we check if the hashcode is matching with the existing hashcodes. Deletion of processed file is not possible because of permissions. Need help here on how to optimize this.
We are using SSHJ library for SFTP communications.