5

On a daily basis we generate about 3.4 Million small jpeg files. We also delete about 3.4 Million 90 day old images. To date, we've dealt with this content by storing the images in a hierarchical manner. The heriarchy is something like this:

/Year/Month/Day/Source/

This heirarchy allows us to effectively delete days worth of content across all sources.

The files are stored on a Windows 2003 server connected to a 14 disk SATA RAID6.

We've started having significant performance issues when writing-to and reading-from the disks.

This may be due to the performance of the hardware, but I suspect that disk fragmentation may be a culprit at well.

Some people have recommended storing the data in a database, but I've been hesitant to do this. An other thought was to use some sort of container file, like a VHD or something.

Does anyone have any advice for mitigating this kind of fragmentation?

Additional Info:

The average file size is 8-14KB

Format information from fsutil:

NTFS Volume Serial Number :       0x2ae2ea00e2e9d05d
Version :                         3.1
Number Sectors :                  0x00000001e847ffff
Total Clusters :                  0x000000003d08ffff
Free Clusters  :                  0x000000001c1a4df0
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x000000208f020000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x000000001e847fff
Mft Zone Start :                  0x0000000002163b20
Mft Zone End   :                  0x0000000007ad2000
user9517
  • 115,471
  • 20
  • 215
  • 297
Zorlack
  • 395
  • 1
  • 5
  • 13

2 Answers2

1

Diskeeper 2009 (now 2010) works well for defragmenting in real time with minimal impact on performance. However, there is a cost as it is a commercial package. We had tried several free apps and found significant performance issues.

Diskeeper Home page

Dave M
  • 4,514
  • 22
  • 31
  • 30
  • Are you referring to their IntelliWrite feature? Does it work correctly? – Zorlack Apr 19 '10 at 17:09
  • I was thinking more of the InvisiTasking technology that does not have a big impact on the server when running. The IntelliWrite seems to keep the volume defragmented once we did an original defrag. The servers respond faster and we do not have the performance issues we encountered with some free products – Dave M Apr 19 '10 at 18:44
1

I assume from your post that you're retaining 90 days worth of images. Doing some quick math, it would appear that you need 4.28TB worth of storage. What are the I/O patterns like (i.e., is any of the data accessed more frequently)? How many volumes do you have this data spread across? How quickly does it take the performance to degrade to an unacceptable level after a defragmentation?

If you're unwilling to make changes to the system (introducing a database), perhaps you should focus on how you can defragment in a manageable fashion with the tools that are bundled with the OS. Rotate and split the data across multiple, smaller LUNs so that you can defragment them individually. After you've finished writing X days worth of data, move to the next LUN and defragment the volume with the previous X days. If you're no longer writing to it, you shouldn't introduce any more fragmentation.

If you've been provided with a sizable budget, you might look at a storage medium that's impervious to fragmentation (such as an SSD).

Zack Angelo
  • 111
  • 2
  • I've been hesitant to use a Database because generally this seems to cause its own problems. I've never had good luck storing large amounts of BLOB data in a database server. Indexing and re-indexing seem to be problematic, and I used to run into table corruption problems all the time. Do you have contrary experience? We're investigating the possibility of having 4 partitions and storing the data in 30 day blocks across these partitions. Then, rather than deleting the files we'd just format the oldest partition. I think 5TB worth of SSD might be out of my budget range hehe – Zorlack Apr 21 '10 at 14:41