I have data across several computers stored in folders. Many of the folders contain 40-100 G of files of size from 500 K to 125 MB. There are some 4 TB of files which I need to archive, and build a unfied meta data system depending on meta data stored in each computer.
All systems run Linux, and we want to use Python. What is the best way to copy the files, and archive it.
We already have programs to analyze files, and fill the meta data tables and they are all running in Python. What we need to figure out is a way to successfully copy files wuthout data loss,and ensure that the files have been copied successfully.
We have considered using rsync and unison use subprocess.POPEn to run them off, but they are essentially sync utilities. These are essentially copy once, but copy properly. Once files are copied the users would move to new storage system.
My worries are 1) When the files are copied there should not be any corruption 2) the file copying must be efficient though no speed expectations are there. The LAN is 10/100 with ports being Gigabit.
Is there any scripts which can be incorporated, or any suggestions. All computers will have ssh-keygen enabled so we can do passwordless connection.
The directory structures would be maintained on the new server, which is very similar to that of old computers.