I have a requirement to archive all the data used to build a report everyday. I compress most of the the data using gzip, as some of the datasets can be very large (10mb+). I write each individual protobuf graph to a file. I also whitelist a fixed set of known small object types and added some code to detect if the file is gzipped or not, when I read it. This is because a small file, when compressed can actually be bigger then uncompressed.
Unfortunately, just due to the nature of the data, I may only have a few elements of a larger object type, and the whitelist approach can be problematic.
Is there anyway to write an object to a stream, and only if it reaches a threshold (like 8kb), then compress it? I don't know the size of the object beforehand, and sometimes I have an object graph with an IEnumerable<T>
that might be considerable in size.
Edit:
The code is fairly basic. I did skim over the fact that I store this in a filestream
db table. That shouldn't really matter for the implementation purpose. I removed some of the extraneous code.
public Task SerializeModel<T>(TransactionalDbContext dbConn, T Item, DateTime archiveDate, string name)
{
var continuation = (await dbConn
.QueryAsync<PathAndContext>(_getPathAndContext, new {archiveDate, model=name})
.ConfigureAwait(false))
.First();
var useGzip = !_whitelist.Contains(typeof(T));
using (var fs = new SqlFileStream(continuation.Path, continuation.Context, FileAccess.Write,
FileOptions.SequentialScan | FileOptions.Asynchronous, 64*1024))
using (var buffer = useGzip ? new GZipStream(fs, CompressionLevel.Optimal) : default(Stream))
{
_serializerModel.Serialize(stream ?? fs, item);
}
dbConn.Commit();
}