If you want to read and write to disk, there's no reason to use a MemoryStream
. That's just a Stream
wrapper over a byte[]
buffer. Serializers like XmlSerializer and Json.NET can write directly to a Stream
or a TextWriter
-derived object. System.Text.Json
can serialize to a Stream
or a Utf8JsonWriter, a high-speed specialized writer used by ASP.NET Core to serialize JSON objects directly to an HTTP response with minimal allocations and reusable buffers.
The only thing you need to change in the JSON serialization documentation is to use the JsonSerializer.Serialize or SerializeAsync overloads that write to a stream :
await using var stream=File.Create(somePath);
await JsonSerializer.SerializeAsync(stream,myObject);
You don't need to write your own class and methods just to abstract that single JsonSerializer.SerializeAsync
call. It would make sense if you wanted to create a repository-like object to abstract file storage, not just JSON serialization, ie a class that would determine storage locations and paths based on configuration and some kind of identifier, eg :
interface IMyStorage
{
public Task<T> Get(string someId);
public async Task Store<T>(T value,string someId);
}
class JsonStorage:IMyStorage<T>
{
readonly string _root;
public JsonStorage(string root)
{
_root=root;
}
public async Task<T> Store(string someId)
{
var path=Path.Combine(_root,someId);
await using var stream=File.Create(path);
await JsonSerializer.SerializeAsync(stream,path);
}
}
Problems with JSON
That said, JSON is still a text format, not suitable for binary serialization. It takes more space, it's slower to write and the lack of a schema means there's no way to know what a JSON string contains.
Another problem is that JSON can only have one root, either an object or an array. It can't have multiple elements. This means you can't simply append objects to a file, or read individual objects. You have to read and write the entire file at once.
One way to serialize multiple objects to a JSON file is to serialize each object on a separate line :
await using var writer=new StreamWriter(path,true); //append text
foreach(var myObject in myList)
{
var line=await JsonSerializer.SerializeAsync(myObject);
await writer.WriteLine(line);
}
Alternatives
There are other widely used formats more suitable for serialization, using less space, some form of a schema and the ability to handle multiple schema versions like Protocol Buffers, Orc, Parquet, Avro and more. Simply by using columnar storage some of those ormats offer compression without using a compression algorithm like GZip or Brotli.
Using one of those common formats means that other applications will be able to read your files. You'll be able to use the available tools to read/edit your files without opening your own application too.
One very common binary format is Goole's Protocol Buffers which is used in gRPC and a lot of tools. You can use it in .NET Core using either .NET's own gRPC tooling or the Protobuf-net library.
In Protocol Buffers you specify the schema of your file in advance in a schema file. With Protobuf-net though all you really need is to add the proper attributes to your classes:
[ProtoContract]
class Person {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
[ProtoMember(3)]
public Address Address {get;set;}
}
[ProtoContract]
class Address {
[ProtoMember(1)]
public string Line1 {get;set;}
[ProtoMember(2)]
public string Line2 {get;set;}
}
Serializing objects is nearly the same as using `JsonSerializer:
await using var file = File.Create("person.bin");
Serializer.Serialize(file, person);
Multiple messages
Protocol Buffers allow storing multiple objects/messages in a stream, but offer no way to detect where one starts and the other ends. The easiest way to solve this is to write the size of the message before the message itself. This is explained in Streaming Multiple Messages:
If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer.
To do this with Protobuf-net, use the SerializeWithLengthPrefix method :
Serializer.SerializeWithLengthPrefix(stream, myObject, PrefixStyle.Base128);
To read the next message from a stream :
var myObject = Serializer.DeserializeWithLengthPrefix<MyObject>(stream, PrefixStyle.Base128);