1

I have a very simple xml file, calculating it's size in bytes and a SHA1 hash. Why do the three ways below give different results (size difference gets larger as the file size gets larger.)

Simple.xml file

<note>
<to>Test</to>
<from>TestTest</from>
<heading>TestTestTest</heading>
<body>TestTestTestTestTest</body>
</note>
Three ways to get size and SHA1 hash.

void Main()
{
    var algorithm = new SHA1Managed();
    foreach (var file in new[] { @"C:\temp\simple.xml" })
    {
        Console.WriteLine($"--------- For Input file {file} ---------");
        Console.WriteLine("Case 1 - Using ReadAllBytes");
        var bytes = File.ReadAllBytes(file);
        Console.WriteLine($"Size: {bytes.Length}");
        Console.WriteLine($"SHA1 Hash: {BitConverter.ToString(algorithm.ComputeHash(bytes)).Replace("-", string.Empty)}");
            
    
        Console.WriteLine("\n\nCase 2 - Using XmlDocument.OuterXml");
        XmlDocument doc = new XmlDocument();
        doc.Load(file);
        bytes = Encoding.UTF8.GetBytes(doc.OuterXml);
        Console.WriteLine($"Size: {bytes.Length}");
        Console.WriteLine($"SHA1 Hash: {BitConverter.ToString(algorithm.ComputeHash(bytes)).Replace("-", string.Empty)}");
                
        Console.WriteLine("\n\nCase 3 - XDocument.ToString()");
        var xdoc = XDocument.Load(file);
        bytes = Encoding.UTF8.GetBytes(xdoc.ToString());
        Console.WriteLine($"Size: {bytes.Length}");
        Console.WriteLine($"SHA1 Hash: {BitConverter.ToString(algorithm.ComputeHash(bytes)).Replace("-", string.Empty)}");
    }
}

Results:

Case 1 - Using ReadAllBytes
Size: 121
SHA1 Hash: 94AD4DCFD700EB139796F6B0EEB11658B57AD57A

Case 2 - Using XmlDocument.OuterXml
Size: 111
SHA1 Hash: EC2979C571F07B2FDC186C4229A2C6CD677BBF8A

Case 3 - XDocument.ToString()
Size: 129
SHA1 Hash: 7236C0AD4279D9FCB0E3DFBA11B833B129032354
Sundar
  • 11
  • 1
  • 1
    Because you are parsing and recreating strings.... Write out those strings and diff them. – Jeremy Lakeman Jan 20 '21 at 03:15
  • XML normalization for signing is hard - if you don't understand XML, encoding and file formats please don't try it at home - use existing methods (as shown in duplicate). If your question is unrelated to XML but rather along the lines "why those *(@#$ use different encodings for text" consider asking new question that does not involve XML. – Alexei Levenkov Jan 20 '21 at 03:33

0 Answers0