3

I have a problem with a memory leak in .NET Core 3.1 API. The application is hosted in azure app service.

It is clearly visible on a graph that under constant load the memory is very slowly growing. it will only go down after app restart.

enter image description here

I created two memory dumps. One with high memory and one after restart and it's clearly visible that the reason is the app trying to load XmlSerialization.dll multiple times.

enter image description here

enter image description here

Now we have multiple other APIs that are using almost identical code when it comes to serialization and I'm not exactly sure why the problem occurs only in this one. Potentially because maybe this one has a much higher traffic when using the APIs.

I've read some articles about XmlSerializer class having memory issues but those were listed for some of the constructors we are not using. The only instance of using XmlSerializer directly in code was using an XmlSerializer(Type) constructor.

private static async Task<T> ParseResponseContentAsync<T>(HttpResponseMessage response, Accept accept)
{
    try
    {
        using (Stream contentStream = await response.Content.ReadAsStreamAsync())
        {
            using (StreamReader reader = new StreamReader(contentStream, Encoding.UTF8))
            {
                switch (accept)
                {
                    case Accept.Xml:
                        XmlSerializer serializer = new XmlSerializer(typeof(T));
                        return (T)serializer.Deserialize(reader);

                    case Accept.Json:
                        string stringContent = await reader.ReadToEndAsync();
                        return JsonConvert.DeserializeObject<T>(stringContent);

                    default:
                        throw new CustomHttpResponseException(HttpStatusCode.NotImplemented, $"Unsupported Accept type '{accept}'");
                }
            }
        }
    }
    catch (Exception ex)
    {
        throw new InvalidOperationException($"Response content could not be deserialized as {accept} to {typeof(T)}", ex);
    }
}

But I'm pretty sure this method is not used in this API anyway .

So another potential problematic place could be somewhere in the Controller serialization of responses.

Startup.cs registration:

services
    .AddControllers(options =>
    {
        options.OutputFormatters.Add(new XmlSerializerOutputFormatter(
            new XmlWriterSettings
            {
                OmitXmlDeclaration = false
            }));
        options.Filters.Add<CustomHttpResponseExceptionFilter>();
    })
    .AddNewtonsoftJson(options => options.SerializerSettings.Converters.Add(
        new StringEnumConverter(typeof(CamelCaseNamingStrategy)))) 
    .AddXmlSerializerFormatters();

Example of an endpoint:

[Produces(MimeType.ApplicationXml, MimeType.TextXml, MimeType.ApplicationJson, MimeType.TextJson)]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status404NotFound)]
[ProducesResponseType(StatusCodes.Status401Unauthorized)]
[HttpGet("EndpointName")]
[Authorize]
public async Task<ActionResult<ResponseDto>> Get([FromModel] InputModel inputModel)
{
    //some code

   return responseDto;
}

Dto returned from the API:

[XmlRoot(ElementName = "SomeName")]
public class ResponseDto
{
    [XmlElement(ElementName = "Result")]
    public Result Result { get; set; }
    [XmlAttribute(AttributeName = "Status")]
    public string Status { get; set; }
    [XmlAttribute(AttributeName = "DoneSoFar")]
    public int DoneSoFar { get; set; }
    [XmlAttribute(AttributeName = "OfTotal")]
    public int OfTotal { get; set; }
}

Now I haven't been able to find any documented cases of .AddXmlSerialization causing these kinds of issues and I'm not sure what the solution or a workaround should be. Any help would be greatly appreciated.

EDIT: I've run some additional tests as @dbc suggested.

Now it seems that we are not even hitting this line new XmlSerializer(typeof(T) in our scenarios since nothing was logged after logger code was added. We do however use default xml serialization for some of our API endpoints. Now one thing I noticed that might be causing this behavior is that the paths in memory dumps logs don't match the files that actually exist in the root folder.

The paths which are visible in memory dumps are *.Progress.Lib.XmlSerializers.dll or *.Domain.Lib.XmlSerializers.dll

Now I wonder if this isn't the issue documented here - link since I can't see those files in wwwroot directory.

If it is I'm not sure if the solution would be to somehow reference the .dlls directly ?

enter image description here

Edit2: Adding a screen of how memory looks like after deploying cached serializer suggested by @dbc. There is no constant growth but it seems after few hours memory rises and doesn't go down. It is possible that the main problem is resolved but since it takes a lot of time to notice big differences we will monitor this for now. There is nothing showing in large object heap or any big number of memory is not allocated in managed memory. This API however when first deployed runs around 250 mB and after one day now at 850 mB. When we turn off the load test tool the memory didn't really go down too much.

enter image description here

Edit3: So we looked closer at some historical data and it seems that the last screen is a normal behavior. It never grows beyond a certain point. Not sure why that happens but this is acceptable.

dbc
  • 104,963
  • 20
  • 228
  • 340
cah1r
  • 1,751
  • 6
  • 28
  • 47
  • The first time `XmlSerializer` serializes a root type `T`, it uses code generation to create, build and load a DLL that can read and write that type and all its descendants. Thus as you serialize more and more root types, you will use more memory for dynamically loaded assemblies -- but as long as you use the `XmlSerializer(typeof(T))` constructor, the run-time DLL will be cached and reused. So the assumption is that the memory used will eventually stabilize once you serialize all the known types of your app for the first time. ... – dbc Oct 02 '22 at 20:11
  • ... It may take a while in practice as certain code branches may take a while to get executed under your usual usage load. But the run-time assembly memory will eventually stabilize. – dbc Oct 02 '22 at 20:14
  • But if you are also using code generation techniques to create run-time types (e.g. via [`TypeBuilder`](https://learn.microsoft.com/en-us/dotnet/api/system.reflection.emit.typebuilder)) then the assumption that your app is only serializing a finite number of statically defined types will be wrong, and run-time DLL memory may keep growing. Do you know whether you are doing something like that? – dbc Oct 02 '22 at 20:15
  • 1
    Also, as an aside: loading your JSON as a string and then deserializing the string using Newtonsoft may result in poor memory performance. If you are having problems with excessive string memory use you may want to deserialize directly from the stream as shown in [the docs](https://www.newtonsoft.com/json/help/html/Performance.htm#MemoryUsage). – dbc Oct 02 '22 at 20:18
  • @dbc Unfortunately the memory never went down even after multiple hours. I haven’t seen TypeBuilder being used anywhere. – cah1r Oct 03 '22 at 03:59
  • 1
    The memory of run-time loaded DLL's won't go down because, once loaded, a DLL can't be unloaded for the lifetime of the appdomain. (DLLs are loaded into unmanaged rather than managed memory so are not garbage collected.) But it should stabilize. If it isn't you,ay be serializing more unique types `T` than you realize. I suggest, for debugging purposes, logging all **unique** full type names of every type `T` passed into `ParseResponseContentAsync()` . E.g. save `typeof(T).AssemblyQualifiedName` in some `HashSet` and log each name the first time it is encountered. – dbc Oct 03 '22 at 04:09
  • @dbc I will try that. But is it somehow possible that this issue occurs when deserializing responses from controllers automatically (this would be possible thanks to AddXmlSerialization line added in startup.cs )? The requests are send with accept/xml header so I’m guessing in such a case API will automatically serialize them to Xml not JSON. XmlSerializer(typeof(T)) is used in a completely different code path so these could be two separate things. – cah1r Oct 03 '22 at 04:24
  • 1
    The `Dictionary>` that is taking all the memory seems to be here: [AssemblyLoadContext.cs](https://github.com/dotnet/runtime/blob/b8d49801fe03b96d2fead3d97a11dce1e723dd17/src/libraries/System.Private.CoreLib/src/System/Runtime/Loader/AssemblyLoadContext.cs#L32). It gets added to in the [AssemblyLoadContext constructor](https://github.com/dotnet/runtime/blob/b8d49801fe03b96d2fead3d97a11dce1e723dd17/src/libraries/System.Private.CoreLib/src/System/Runtime/Loader/AssemblyLoadContext.cs#L108). You may need to debug to see why + where this is happening. – dbc Oct 03 '22 at 04:45
  • @dbc I edited my answer above with some additional information. I tried adding logging in that one line but it seems that we are not even hitting it in our current flows. I found one bug that could maybe explain this behavior. The code keeps retrying failed assembly loads forever and causes memory leak. Since I don't see those .dlls in the places that are visible in the memory dump this makes me suspicious. However I'm not sure what the best solution would be for this and why this occurs in the first place – cah1r Oct 06 '22 at 08:27
  • In your [screen shot of DLLs that fail to load](https://i.stack.imgur.com/DEjfX.png), you have blanked out the full names of types that are being loaded, for example `XXX.Domain.Serializers.Xml`. That name `XXX.Domain` corresponds to some type in your application. What happens if you attempt to construct an `XmlSerializer` for `Domain` by doing `new XmlSerializer(typeof(Domain))`? Does it succeed, or throw an exception? – dbc Oct 06 '22 at 19:56
  • @dbc Domain is a name of the project. The main .net core 3.1 API references multiple .net standard 2.0 .dlls. Some of them contain classes that will be serialized. In the analysis it doesn't unfortunately say which specific class is the problem – cah1r Oct 11 '22 at 09:54

2 Answers2

4

The assemblies that the new XmlSerializer(typeof(T)) constructor are trying to load are Microsoft XML Serializer Generator assemblies a.k.a Sgen.exe assemblies that might have or might not been created at the time the app was built.

But what are Sgen assemblies? In brief, XmlSerializer works by generating code to serialize and deserialize the type passed into the constructor, then compiling that generated code into a DLL and loading it into the application domain to do the actual serialization. This run-time DLL generation can be time-consuming, but as long as you use the XmlSerializer(Type) or XmlSerializer(Type, String) constructors it will only be done once per type T, with the resulting assembly being cached internally in a static dictionary by XmlSerializer.

As you might imagine this can cause the first call to new XmlSerializer(typeof(T)) to be slow, so (in .NET 2 I believe, this is all very old code) Microsoft introduced a tool to generate those run-time serialization DLLs at application build time: SGen.exe. This tool doesn't work for all types (e.g. generics) and was, if I recall correctly, finicky to use, but when it did work it did speed up serializer construction. Once loaded successfully the Sgen assembly is cached in the same cache used for generated assemblies.

And it seems like you have stumbled across a bug in .NET Core 3.1, 5, and 6 related to this:

  1. The base class method OutputFormatter.CanWriteResult(OutputFormatterCanWriteContext context) of XmlSerializerOutputFormatter tests whether a type can be serialized by calling XmlSerializerOutputFormatter.CanWriteType(Type type). This in turn tests to see whether a type is serializable by XmlSerializer by attempting to construct a serializer for the type and returning false if construction failed because any exception was thrown. The serializer is cached if construction was successful, but nothing is cached if construction failed.

  2. the new XmlSerializer(Type) constructor tries to load an Sgen assembly unless an assembly has already been cached for the type by a previous successful call to the constructor.

  3. But if a type is not serializable by XmlSerializer, the constructor will throw an exception and nothing will be cached. Thus successive attempts to construct a serializer for the same non-serializable type will result in multiple calls to load Sgen assemblies.

  4. As you yourself found, .NET Core itself permanently leaks a small amount of IndividualAssemblyLoadContext memory every time assembly load fails: Failed Assembly.Load and Assembly.LoadFile leaks memory #58093.

Putting all this together, enabling XML serialization when some of your DTOs are not serializable (because e.g. they don't have parameterless constructors) can result in ever-growing IndividualAssemblyLoadContext memory use.

So, what are your options for a workaround?

Firstly, issue #58093 was apparently fixed in .NET 7 with pull #68502 so if you upgrade to this version the problem may resolve itself.

Secondly, you could subclass XmlSerializerOutputFormatter to cache returned XmlSerializer instances even when null. This will prevent multiple attempts to create serializers for non-seializable types.

First, subclass XmlSerializerOutputFormatter and override XmlSerializerOutputFormatter.CreateSerializer(Type) as follows:

public class CachedXmlSerializerOutputFormatter : XmlSerializerOutputFormatter
{
    // Cache and reuse the serializers returned by base.CreateSerializer(t).  When null is returned for a non-serializable type,
    // a null serializer will be cached and returned.
    static readonly ConcurrentDictionary<Type, XmlSerializer> Serializers = new ConcurrentDictionary<Type, XmlSerializer>();

    public CachedXmlSerializerOutputFormatter() : base() { }
    public CachedXmlSerializerOutputFormatter(ILoggerFactory loggerFactory) : base(loggerFactory) { }
    public CachedXmlSerializerOutputFormatter(XmlWriterSettings writerSettings) : base(writerSettings) { }
    public CachedXmlSerializerOutputFormatter(XmlWriterSettings writerSettings, ILoggerFactory loggerFactory) : base(writerSettings, loggerFactory) { }

    protected override XmlSerializer CreateSerializer(Type type) { return Serializers.GetOrAdd(type, (t) => base.CreateSerializer(t)); }
}

Then replace use of XmlSerializerOutputFormatter with your subclassed version as follows:

services
    .AddControllers(options =>
    {
        options.OutputFormatters.Add(new CachedXmlSerializerOutputFormatter (
            new XmlWriterSettings
            {
                OmitXmlDeclaration = false
            }));
        options.Filters.Add<CustomHttpResponseExceptionFilter>();
    })
    .AddNewtonsoftJson(options => options.SerializerSettings.Converters.Add(
        new StringEnumConverter(typeof(CamelCaseNamingStrategy)))) 
    .AddXmlSerializerFormatters();

This should in theory eliminate the repeated failing calls to load Sgen assemblies.

Notes:

Demo fiddles:

  1. Demo fiddle showing that that multiple calls to XmlSerializerOutputFormatter.CanWriteType() for a non-serializable DTO result in multiple assembly load failures here: demo #1.

  2. Demo fiddle showing that CachedXmlSerializerOutputFormatter fixes this problem here: demo #2.

  3. Demo that multiple calls to XmlSerializerOutputFormatter.CanWriteType() for a serializable DTO do not result in multiple assembly load failures, and hence don't cause growing IndividualAssemblyLoadContext memory use, here: demo #3.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • it worked perfectly! i tried caching in code i don't know why i haven't thought of doing the same in the xmlOutputFormatter – cah1r Oct 10 '22 at 13:12
  • I think I was to quick to be happy :) there is nothing in managed memory. It's size is also quite small. The references to AssemblyContext are gone but the memory still grows. Now it seems to spike from time to time and not go down. Before it was slowly growing. So my idea is that there are still things being added to unmanaged memory. – cah1r Oct 11 '22 at 08:11
  • @cah1r - We're going to need a more detail to help you. Are you sure at this point your problem is caused by `XmlSerializer`? Maybe you aren't disposing of something disposable? Or (just a shot in the dark here) if you are allocating lots of large objects you may need to enable [large object heap compaction](https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap). See e.g. [Large Object Heap Compaction, when is it good?](https://stackoverflow.com/q/20035550). – dbc Oct 11 '22 at 08:38
  • @cah1r - that being said, if you use `XmlSerializer`, every time you serialize or a deserialize a type **for the first time**, code will be generated, compiled and loaded as a DLL, and never unloaded. So you will see spikes for each newly encountered type that never go down. – dbc Oct 11 '22 at 08:39
  • I looked at the Large object heap and there are a few objects there but nothing that would explain the overall memory size. As to the XmlSerializer I'm not 100% sure its the problem since there is no solution yet :) i thought it was the main candidate since I could see big number of references to it in the memory analysis. What is weird that the spikes now happen in the application with virtually no load. Maybe I will monitor what is going on for few days and than I will have more information. Maybe azure app service where the app is hosted is just doing something strange – cah1r Oct 11 '22 at 09:49
  • I added a new screen with how memory looked like after deploying your fix. – cah1r Oct 11 '22 at 09:58
  • So we looked closer at some historical data and it seems that the last screen is a normal behavior. It never grows beyond a certain point. Not sure why that happens but this is acceptable. So I guess that the memory leak is fixed! – cah1r Oct 12 '22 at 08:42
1

This might not be feasible, but could you offload the XML generation onto Azure API Management?
https://learn.microsoft.com/en-us/azure/api-management/api-management-transformation-policies#ConvertJSONtoXML

ColinM
  • 74
  • 2
  • Hmm we could probably try that but we are not using that Azure component at the moment. We have plans for that so it's good to know that it is possible – cah1r Oct 10 '22 at 06:32