-1

I am new to using System.Text.Json. I was using BinaryFormatter, now need to migrate to System.Text.Json due to Security vulnerabilities Binaryformmater poses. I need to serialize the object into the stream and store it in disk. Then Upon calling get method, it should fetch the stream and deserialize the data to object. I did not find the documentation useful (https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?pivots=dotnet-6-0). Below is the pseudo code, I need to write the JsonSerializer Class. Can someone help me?

public class Foo
{
 public void get() 
 {
   using (MemoryStream stream = new MemoryStream())
   {
     FetchStreamFromDisk() // Fetches stream from disk
     return JsonSerializer.deserialize(stream)
   }
 }
 public void Put(Object data) 
 {
   using (MemoryStream stream = new MemoryStream())
   {
     JsonSerializer.serialize(data, stream);
     StoreStreamIntoDisk()// Store the data into the disk/DB. So the stream should not get closed in the JsonSerializer Class
   }
 }
}

public static class JsonSerializer
{
  public void serialize(Object data, out MemoryStream stream) 
  {
    // Serialize Data 
  }
  public Object deserialize(MemoryStream stream) 
  {
    // Deserialize Data 
  }
}
Al Pa chino
  • 75
  • 2
  • 8
  • 2
    The documentation is fine. The problem is that `BinaryFormatter` has nothing to do with JSON or serialization to any kind of text format so whatever patterns or idioms you used can't be used now. You're actually looking at this the wrong way. The equivalent to System.Text.Json is XmlSerializer, DataContractSerializer, Json.NET. Unlike BinaryFormatter, all serializers work on entire object graphs, not individual fields – Panagiotis Kanavos Nov 19 '21 at 07:26
  • 1
    What is the *actual* problem you want to solve? Replace BinaryFormatter for *binary* serialization? How do you want to *use* your serialization code? Serializers are object based. Neither XML nor JSON is a good option for binary serialization. A better choice would be Protocol Buffers, the binary format of gRPC. You can use [.NET's gRPC tooling](https://learn.microsoft.com/en-us/aspnet/core/grpc/?view=aspnetcore-6.0) or [protobuf-net](https://github.com/protobuf-net/protobuf-net) to specify the schema of your objects and serialize them. – Panagiotis Kanavos Nov 19 '21 at 07:32
  • XMLSerializers are slow compared to JsonSerializers as per multiple blogs (ex :https://inspiration.nlogic.ca/en/a-comparison-of-newtonsoft.json-and-system.text.json ). So, i wanted to use JsonSerializer. In JsonSerializer, there are 2 serializers i.e. System.text.json and NewtonsoftJson. But, i wanted to use system.text.json as it is faster and secure than the other. – Al Pa chino Nov 19 '21 at 07:33
  • 1
    That doesn't explain what your question and problems are. And JSON is still a text format, bigger and slower than binary formats – Panagiotis Kanavos Nov 19 '21 at 07:33
  • I wanted to migrate from BinaryFormatter usage to System.Text.Json's JsonSerializer. – Al Pa chino Nov 19 '21 at 07:35
  • There are other very common file formats built to handle lots of data and volatile schemas, like Parquet, Orc, Avro and more. What you choose depends on your requirements, which you haven't explained. – Panagiotis Kanavos Nov 19 '21 at 07:35
  • My Question : I only see Serialize function which formats to text. But, i want it to confine to MemoryStream Object. – Al Pa chino Nov 19 '21 at 07:36
  • I know BinaryFormatter has nothing to do with Json, but my requirement is to remove the usage of BinaryFormatter's serialization method and use System.text.Json Serialization. – Al Pa chino Nov 19 '21 at 07:40
  • You misunderstand what `MemoryStream` is and what Json is then. JSON is text. End of story. When you serialize to JSON you serialize to text. A `MemoryStream` is just a `Stream` API over a `byte[]` buffer. That buffer is no different than a `string` - both are bytes in memory. You don't write text to even a FileStream directly though, you typically use a `StreamWriter` or its parent, `TextWriter`. All serializers work with either `Stream` or `TextWriter` – Panagiotis Kanavos Nov 19 '21 at 07:41
  • As for System.Text.Json, writing to a stream [is already available](https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializer.serialize?view=net-6.0#System_Text_Json_JsonSerializer_Serialize_System_IO_Stream_System_Object_System_Type_System_Text_Json_JsonSerializerOptions_) as one of the `JsonSerializer.Serialize` overloads. Don't use that to serialize to a MemoryStream though - that's pointless. That's still an in-memory buffer, just like the `string` produced by other overloads. use it to serialize to a file or a response stream without buffering the results – Panagiotis Kanavos Nov 19 '21 at 07:42
  • Thank you, Will see the information provided. – Al Pa chino Nov 19 '21 at 07:43

1 Answers1

2

If you want to read and write to disk, there's no reason to use a MemoryStream. That's just a Stream wrapper over a byte[] buffer. Serializers like XmlSerializer and Json.NET can write directly to a Stream or a TextWriter-derived object. System.Text.Json can serialize to a Stream or a Utf8JsonWriter, a high-speed specialized writer used by ASP.NET Core to serialize JSON objects directly to an HTTP response with minimal allocations and reusable buffers.

The only thing you need to change in the JSON serialization documentation is to use the JsonSerializer.Serialize or SerializeAsync overloads that write to a stream :

await using var stream=File.Create(somePath);
await JsonSerializer.SerializeAsync(stream,myObject);

You don't need to write your own class and methods just to abstract that single JsonSerializer.SerializeAsync call. It would make sense if you wanted to create a repository-like object to abstract file storage, not just JSON serialization, ie a class that would determine storage locations and paths based on configuration and some kind of identifier, eg :

interface IMyStorage
{
    public Task<T> Get(string someId);
    public async Task Store<T>(T value,string someId);
}

class JsonStorage:IMyStorage<T>
{
    readonly string _root;
    
    public JsonStorage(string root)
    {
        _root=root;
    }

    public async Task<T> Store(string someId)
    {
        var path=Path.Combine(_root,someId);

        await using var stream=File.Create(path);

        await JsonSerializer.SerializeAsync(stream,path);
    }

    
}

Problems with JSON

That said, JSON is still a text format, not suitable for binary serialization. It takes more space, it's slower to write and the lack of a schema means there's no way to know what a JSON string contains.

Another problem is that JSON can only have one root, either an object or an array. It can't have multiple elements. This means you can't simply append objects to a file, or read individual objects. You have to read and write the entire file at once.

One way to serialize multiple objects to a JSON file is to serialize each object on a separate line :

await using var writer=new StreamWriter(path,true); //append text
foreach(var myObject in myList)
{
    var line=await JsonSerializer.SerializeAsync(myObject);
    await writer.WriteLine(line);
}

Alternatives

There are other widely used formats more suitable for serialization, using less space, some form of a schema and the ability to handle multiple schema versions like Protocol Buffers, Orc, Parquet, Avro and more. Simply by using columnar storage some of those ormats offer compression without using a compression algorithm like GZip or Brotli.

Using one of those common formats means that other applications will be able to read your files. You'll be able to use the available tools to read/edit your files without opening your own application too.

One very common binary format is Goole's Protocol Buffers which is used in gRPC and a lot of tools. You can use it in .NET Core using either .NET's own gRPC tooling or the Protobuf-net library.

In Protocol Buffers you specify the schema of your file in advance in a schema file. With Protobuf-net though all you really need is to add the proper attributes to your classes:

[ProtoContract]
class Person {
    [ProtoMember(1)]
    public int Id {get;set;}
    [ProtoMember(2)]
    public string Name {get;set;}
    [ProtoMember(3)]
    public Address Address {get;set;}
}
[ProtoContract]
class Address {
    [ProtoMember(1)]
    public string Line1 {get;set;}
    [ProtoMember(2)]
    public string Line2 {get;set;}
}

Serializing objects is nearly the same as using `JsonSerializer:

await using var file = File.Create("person.bin");
Serializer.Serialize(file, person);

Multiple messages Protocol Buffers allow storing multiple objects/messages in a stream, but offer no way to detect where one starts and the other ends. The easiest way to solve this is to write the size of the message before the message itself. This is explained in Streaming Multiple Messages:

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer.

To do this with Protobuf-net, use the SerializeWithLengthPrefix method :

Serializer.SerializeWithLengthPrefix(stream, myObject, PrefixStyle.Base128);

To read the next message from a stream :

var myObject = Serializer.DeserializeWithLengthPrefix<MyObject>(stream, PrefixStyle.Base128);
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236