0

I am trying to read a binary file taken at sole argument and i have just switched to c# from c++. I know what the problem is:

The problem is when i read the binary file it comes byte by byte (8 bits) and that i save into a variable (this variable is "symbol" in my code below) that may be of int/long/short on 32/64 bit architecture. Suppose if i have variable of "T" type.Let's say T symbol and i read and store binary file in a generic type "T" variable called "processingValue" like this

 byte[] bytes = stream.ReadBytes(size);  
 T processingValue = converter(bytes,0);  

I create the object of the class "ConstructorClass" (whose definition contains the above two lines of code is) like this :

constructorClass <uint> ObjSym = new ConstructorClass <uint> (args, BitConverter.ToUInt32);

And definition of constructor is like this :

public ConstructorClass(string[] args, Func <byte[], int,T> converter)
{
 //those two lines of code
} 

The problem occurs when if the length of the Binary file read is not the multiple of the data type i have chosen in constructorClass object creation. I mean suppose for "uint" case it works fine because it is of 4 bytes and the binary file length is multiple of 4 byte. But suppose is i take "short"(Int16) then it gives Unhandled Exception because the it's size is 2 bytes and the length of Binary file is not equal to the multiple of "2 bytes".So what happen on the last read of the binary file the compiler don't find and data according to the size of "short". (I mean short needs 16 bits but may be the bits found on the last read were only 8 or may be less or more but not 16 bits, so it gives Unhandled exception).

Here i tried it on "long" (because on "int" it works fine) Huffman <long> ObjSym = new Huffman <long> (args, BitConverter.ToInt64); Then i got this :

Unhandled Exception: System.ArgumentException: Destination array is not long enough to copy all the items in the collection. Check array index and length.
  at System.BitConverter.PutBytes (System.Byte* dst, System.Byte[] src, Int32 start_index, Int32 count) [0x00000] in <filename unknown>:0 
  at System.BitConverter.ToInt64 (System.Byte[] value, Int32 startIndex) [0x00000] in <filename unknown>:0 
  at shekhar_final_version_Csharp.Huffman`1[System.Int64]..ctor (System.String[] args, System.Func`3 converter) [0x00000] in <filename unknown>:0 
  at shekhar_final_version_Csharp.MyClass.Main (System.String[] args) [0x00000] in <filename unknown>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.ArgumentException: Destination array is not long enough to copy all the items in the collection. Check array index and length.
  at System.BitConverter.PutBytes (System.Byte* dst, System.Byte[] src, Int32 start_index, Int32 count) [0x00000] in <filename unknown>:0 
  at System.BitConverter.ToInt64 (System.Byte[] value, Int32 startIndex) [0x00000] in <filename unknown>:0 
  at shekhar_final_version_Csharp.Huffman`1[System.Int64]..ctor (System.String[] args, System.Func`3 converter) [0x00000] in <filename unknown>:0 
  at shekhar_final_version_Csharp.MyClass.Main (System.String[] args) [0x00000] in <filename unknown>:0 

My code is : (please see below the definition and call to this constructor definition by creating object, also it's parameter which could be uint/long/short etc.) //This is constructor definition (class is Huffman )

public Huffman(string[] args, Func < byte[], int, T > converter)  
{
    front = null;
    int size = Marshal.SizeOf(typeof (T));
    Console.WriteLine("Size: {0}  ", size);
    using(var stream = new BinaryReader(System.IO.File.OpenRead(args[0]))) 
    {
        while (stream.BaseStream.Position < stream.BaseStream.Length)
        {
            byte[] bytes = stream.ReadBytes(size);
            T processingValue = converter(bytes, 0);
            {
                Node pt, temp;
                bool is_there = false;
                pt = front;
                while (pt != null) 
                {
                    if (pt.symbol.Equals(processingValue)) //Here i find out the frequency (freq) of each symbol (by frequency means count of number of time the symbol repeats in file but it works fine.)
                    {
                        pt.freq++;
                        is_there = true;

                        break;
                    }
                    temp = pt;
                    pt = pt.next;
                }
            }
        }
    }
}

This is my main function where i create object for constructor

public static void Main(string[] args) 
{
    Huffman <uint> ObjSym = new Huffman <uint> (args, BitConverter.ToUInt32);
}

This is my constructor class

public class Huffman < T > where T: struct, IComparable < T > , IEquatable < T > 
{
    public int data_size, length, i, is_there;
     public class Node
    {
        public Node next;
        public T symbol;
        public int freq;
    }
    public Node front, rear;
}

And the bianry file read contains data like this :

hp@ubuntu:~/Desktop/Huf_pointer$ xxd -b out.bin 
0000000: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000006: 00000000 00000000 00000000 00000000 00000000 00000000  ......
000000c: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000012: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000018: 00000000 00000000 00000000 00000000 00000000 00000000  ......
000001e: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000024: 00000000 00000000 00000000 00000000 00000000 00000000  ......
000002a: 00000000 00000000 00000000 00000000 00000000 00000000  ......
0000030: 00000000 00000000 00000000 00000000 00000000 00000000  ......
.........//Here also there is similar kind of data    ................
00008ca: 00010011 00010011 00010011 00010011 00010011 00010011  ......
00008d0: 00010011 00010011 00010011 00010011 00010011 00010011  ......
00008d6: 00010011 00010011 00010011 00010011 00010011 00010011  ..... 

And from this kind of binary file i have to find the count of the number of times each symbol repeats (represented by my freq variable in code.)But that works fine i am not writing the code for that in my given below code to present my question shortly i have just given the necessary part of code only.

I think the only solution for this is to fill the last read bytes with "0" (I mean add padding) to make it multiple of the data type we have chosen (uint/long/short etc.). This could be the only way to avoid it the Unhandled exception. Could any one please help me in writing code for making last read byte as multiple of size of data type ? Or if you have more efficient way to avoid this Unhandled exception ? If not then could you please help me in achieving me by my idea ? I would really appreciate small piece of code for adding padding, Thanks.

Sss
  • 1,519
  • 8
  • 37
  • 67
  • 1
    If you don't know what is written into the stream why you are expecting to make good guess on what is there by randomly reading some bytes? Should you treat 8 bytes as: 4 short, byte+short+int+byte, int+int,...? Usually there is some meta-info that let you know what is there (i.e. fixed file format or size prefixes before arrays) – Alexei Levenkov Mar 22 '14 at 23:31
  • @AlexeiLevenkov please see the edited code to know what type of binary file i am reading. – Sss Mar 22 '14 at 23:35
  • Could you please be more specific about the particular kind of "unhandled exception" you are getting and the exact message in the exception object? `FormatException`, `IOException`? What type, exactly? Also, a stack trace would be nice too. – Leandro Mar 22 '14 at 23:45
  • I see - you are trying to write compression code for arbitrary file - yes adding `0`-padding at the end sounds fine. Don't forget to add real file size to your compressed output to be able to remove padding on decompression. – Alexei Levenkov Mar 22 '14 at 23:52
  • @AlexeiLevenkov But i am not able to understand how i will add the padding "0" to the last read of file. Could you please give me an algorithm so that i can make it as reference for my code ? (you can my already existing code if you want).Thanks again – Sss Mar 22 '14 at 23:55

1 Answers1

1

Something like ((byteCount + 3) & ~3) would give you next size aligned to 4 bytes so you'll know how many bytes to pad with 0. You may need to copy array since you can't grow arrays (or create bigger array to start with). Something like:

var sizeRoundedToNext4 = (size + 3) & ~3;
var slightlyBiggerArray = new byte[sizeRoundedToNext4]; // 0 filled already
stream.Read(slightlyBiggerArray, size);
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • @Alexxei so yo umean i need to count at each read that if there are enough bits to be be stored in our "symbol's data type ? It will badly affect the complexity.Am i right ? – Sss Mar 23 '14 at 00:05
  • @user234839 this is answer for your "how to get padding" part of the question. Your read code is backward from *my point of view* (you are trying to guess how many bytes type needs instead of letting the reader for the type read necessary amount from the stream), so I can't really comment on how to fix that. – Alexei Levenkov Mar 23 '14 at 00:51