I have about 8 TB worth of 'sample' data with the following characteristics:
each sample: 5-15 GB in one folder containing ~20k files and ~10k subfolders (2000 top-level, 5 sub-level containing a ~.5-2MB data files and small settings files).
I am setting up a Dell T710 server running Windows server 2008 R2 with 19 TB effective space (RAID5) in order to consolidate the data. I have previously seen significant slow-downs when opening/browsing/copying on a computer with about 1.5 TB of this type of data on a dedicated internal drive (NTFS).
Each sample will be copied to this server for storage, but analysis will occur elsewhere (data copied off of server). So no daily change in existing data, just new data.
What is the best drive configuration to handle this type of data? Drive is GPT and currently has EFI, MSR, 70 GB system partition, and empty 19 TB data partition.
- one large 19 TB volume
- several smaller volumes (less fragmentation?)
would it be advisable to create a per-sample zip archive and store this instead? I would hesitate about this because users understand folders intuitively, and corruption has worse effects on archives -- we could afford a few corrupted sub-folders (sample 'pixels', more or less) in the extreme case, but corrupting an entire sample archive would be bad.