Large Number of 0 Values at End of Byte Array

Question

I'm using BitMiracle's LibTiff.Net to read in a Bitmap image and return a TIFF byte[] that gets embedded in a file as a Base64String. I've noticed that the Base64 string ends up being quite a bit longer than I expect, the tail end of which is a large number of 'A' characters. While debugging, I see that the byte[] that LibTiff is returning to me has several thousand 0 values at the end that don't seem to be a necessary part of the image itself (so far as I can tell).

I'm using BitMiracle's sample code here to convert: https://bitmiracle.github.io/libtiff.net/html/075f57db-d779-48f7-9fd7-4ca075a01599.htm

I can't quite see what would cause "garbage" at the end of the byte[], though. Any thoughts?

Edit to add code - GetTiffImageBytes() is in the link above:

public void GenImage()
      using (System.Drawing.Image frontImage = System.Drawing.Image.FromStream(file))//;
            {
                file.Close();

                //Draw something
                b = new Bitmap(frontImage);
                Graphics graphics = Graphics.FromImage(b);
                graphics.DrawString(data1, (Font)GlobalDict.FontDict["font1"], Brushes.Black, 200, 490);
                graphics.DrawString(data2, (Font)GlobalDict.FontDict["font2"], Brushes.Black, 680, 400);

            }
            //Convert to TIF - requires BitMiracle.LibTiff.Classic
            byte[] tiffBytes = GetTiffImageBytes(b, false);

            return tiffBytes;
            }

The above is called by:

  byte[] aFrontImage = MiscTools.GenImage(somestuff);

  fileXML.WriteLine("    <FrontImage>" + System.Convert.ToBase64String(aFrontImage, 0, aFrontImage.Length) + "</FrontImage>");

All things said and done, it functions fine, the resulting images are readable by our application. I'm just trying to pare down the size since some of these files may have tens of thousands of images. I have some older sample files that were created by hand with some Base64 strings via another method that are about the same size strings, save all the tailing bytes that I'm thinking are garbage.

As someone commented, one option may be to just read the byte[] and remove all 0 values from the end prior to converting, but I'm trying to figure out why it's happening to begin with.

Thanks!

If you think they are garbage, remove them and try to load the image. If it doesn't work, then it means they are necessary. — Eser, Mar 30 '16 at 21:04
I added some of the code - thanks. @Eser - I'd like to, but I'm curious where the extra data is coming from to begin with. I'd rather fix the root cause than work around it, if possible. — Jesse Williams, Mar 30 '16 at 21:36
SWAG: Could be initial allocation of image based on size vs. what was actually needed after all optimizations...for example, run-length-encoding and such. — Clay, Mar 30 '16 at 21:38
@Clay - that's sort of what I'm thinking. I guess I'll have to tear up their sample code and get to the bottom of it. Thanks! — Jesse Williams, Apr 01 '16 at 22:58
Can you try replacing `return ms.GetBuffer();` with `return ms.ToArray();` and see if that helps? — Lasse V. Karlsen, Apr 19 '16 at 13:28

Lasse V. Karlsen · Accepted Answer · 2016-04-19T13:35:41.903

3

The problem is most likely this, found in the linked source example:

return ms.GetBuffer();

For a MemoryStream, this will return the whole underlying array, even if you haven't actually used all of that array yet. This buffer will be resized to a bigger buffer if you write enough to fill it, but it won't be extended to just cover the needed size, it will grow to twice its previous size each time. In addition you have a Length property that will indicate how much of this array is actually used.

This is akin to the capacity of a List<T>, which will also double in size every time you fill the current capacity. The Count property will indicate how many items you actually have in the list.

The fix is easy, replace the above line of code with this:

return ms.ToArray();

This will create a new array, just big enough to contain the bytes actually written to the memory stream and copy the contents of the buffer (the part that fits, and counts) into it.

To verify that the buffer is bigger than needed you can run this simple code:

var ms = new MemoryStream();
Console.WriteLine("GetBuffer: " + ms.GetBuffer().Length);
Console.WriteLine("ToArray: " + ms.ToArray().Length);
ms.WriteByte(0);
Console.WriteLine("GetBuffer: " + ms.GetBuffer().Length);
Console.WriteLine("ToArray: " + ms.ToArray().Length);

This will output this:

GetBuffer: 0
ToArray: 0
GetBuffer: 256
ToArray: 1

As you can see, the initial buffer size when writing just 1 byte to it grew to 256 bytes. After this it will double each time you reach the current size.

.NET Fiddle here.

edited Apr 19 '16 at 13:35

answered Apr 19 '16 at 13:30

Lasse V. Karlsen

380,855
102
628
825

Well, yeah. Thank you! That was perfect actually. What a silly thing to have missed. It's clean and definitely keeps my TIFF tags intact. – Jesse Williams Apr 19 '16 at 13:46
1

Most likely image decoding doesn't care about extra zeroes, but you should now definitely remove your trimming code, if you happen to trim away a zero that is expected (and now, with the above change, correct), it is likely to cause an image decoding problem. – Lasse V. Karlsen Apr 19 '16 at 13:48
Yeah, I'm not using the fixImageByteArray() with the ToArray(). Thanks! :) – Jesse Williams Apr 19 '16 at 13:50
what about .ToArray().Where(a => a != default(byte)).ToArray(); to remove byte(0)? It works too and I haven't found out edge cases. – roland Nov 27 '20 at 09:28
@roland What if you write a zero-byte to the stream? And if you've already called `.ToArray()` on the stream you have the *actual bytes* that have been written to it, you don't need to remove anything, and you probably shouldn't. In terms of saving an image, the context of this question, you **definitely** shouldn't. – Lasse V. Karlsen Nov 27 '20 at 09:41
Thanks @Lasse V Karlsen. You're right. To fix my issue, I changed the padding from PaddingMode.Zeros to PaddingMode.PKCS7 – roland Nov 27 '20 at 13:05

Jesse Williams · Answer 2 · 2016-04-19T13:22:41.477

For now, I just went with "fixing" the issue after the fact and created a method that I call on each image:

        private static byte[] fixImageByteArray(byte[] inByte)  // Fix issue with garbage suffix data - reduces image byte[] size by roughly half.
    {
        int newByteBaseLength = inByte.Length - 1;  
        while (inByte[newByteBaseLength] == 0)
        {
            --newByteBaseLength;
        }

        float newByteModifiedLength = ((inByte.Length - newByteBaseLength) * 0.1f) + 0.5f;  // When using newByteBaseLength + 1, some TIFF Tag data was getting lost.  This seems to resolve the issue.

        int newByteModifiedLengthAsInt = (int)newByteModifiedLength;

        byte[] outByte = new byte[newByteBaseLength + newByteModifiedLengthAsInt];
        Array.Copy(inByte, outByte, newByteBaseLength + newByteModifiedLengthAsInt);

        return outByte;
    }

EDIT: I modified the variable names to make a bit more sense. I found that the old way (using newByteBaseLength + 1) to size the array led to some damage to TIFF Tags. By using a slightly less efficient method, the image size is still significantly reduced, but the tags stay intact.

Large Number of 0 Values at End of Byte Array

2 Answers2