What factors affect ideal s3ql --max-obj-size value?

Question

I'm trying to understand all the relevant implications of the --max-obj-size value used when creating an s3ql file system. I have yet to find a complete description of the implications of this option, but have been able to piece together a few bits from the docs and discussion groups.

Mainly, I have found reasons to use larger --max-obj-size values, which leaves me wondering, why not use an arbitrarily large value (10mb? 100mb? 1gb?):

Smaller values mean more "inodes" are used, and worse performance from the sqlite database (as the same number of files requires more inode entries)
Smaller values can hurt throughput (especially for sequential reads).

From the version 1.8 changelog:

As a matter of fact, a small S3QL block size does not have any advantage over a large block size when storing lots of small files. A small block size, however, seriously degrades performance when storing larger files. This is because S3QL is effectively using a dynamic block size, and the --blocksize value merperformanceely specifies an upper limit.

So far the only advantages I have found or imagined for smaller block sizes are:

Less bandwidth used to re-write a portion of a file
Possibly better deduplication

The --min-obj-size option does not affect deduplication. Deduplication happens before blocks are grouped.

The --max-obj-size affects deduplication, since it implicitly determines the maximum size of a block.

Found here:

Can anyone offer a summary of the trade-offs one makes when selecting a larger or smaller --max-obj-size when creating a s3ql file system?

What factors affect ideal s3ql --max-obj-size value?

0 Answers0