understanding format of file

Question

I have a question regarding file reading and I am getting frustrated over it as I am doing some handwriting recognition development and the tool I am using doesn't seem to read my training data file.

So I have one file which works perfectly fine. I paste some contents of that file here:

 è      Aڈ2*A   ê“AêA mwA)àXA$NلAئ~A›إA:ozA)"ŒA%IœA&»ّAم3ACA

|®AH÷AD¢A ô-A گ&AJXAsAA mGA قQAٍALs@÷8´A

The file is in a format I know about that first 12 bytes are 2 longs and 2 shorts with most probably data as 4 , 1000 , 1024 , 9 but T cannot read the file to get these values.

Actually I want to write my first 12 bytes in format similar to the mentioned above and I dont seem to get how to do it.

Forgot to mention that the remaining data are float points. When I write data into file I get human readable text not these symbols and when I am reading these symbols I do not get the actual values. How to get the actual floats and integers across these symbols?

My code is

struct rec
{
    long a;
    long b ;
    short c;
    short d;
}; // this is the struct 

FILE *pFile;
struct rec my_record;

// then I read using fread

fread(&my_record,1,sizeof(my_record),pFile);`

and the values i get in a, b, c and d are 85991456, -402448352, 8193, and 2336 instead of the actual values.

The [same question](http://stackoverflow.com/q/10671361/509303) was asked by you an hour earlier, wait for the answers instead of spamming questions. — buc, May 20 '12 at 08:52
yea but in that question i wasn't able to put up the question properly and there were way to many edits so i though it might be difficult for users to understand . but my bad — , May 20 '12 at 09:07

score 0 · Answer 1 · answered May 20 '12 at 08:09

0

Compiler adds padding to your structure members to make it (typically ) 4byte aligned. In this case variables c and d are padded.

You should read per-defined data types at a time From your fread instead of your structure.

answered May 20 '12 at 08:09

Jay D

3,263
4
32
48

ok so what you are saying is i read into a long then a long and so on ? but the problem is fread doesnot seem to read anything logical let me try again and give u the result – May 20 '12 at 08:28

buc · Accepted Answer · 2012-05-20T08:44:05.290

0

First of all, you should open that file in a hex editor, to see exactly what bytes it contains. From the text excerpt you have posted I think it does not contain 4, 1000, 1024 and 9 as you expect, but text form may be very misleading, because different character encodings show different characters for the same sequences of bytes.

If you have confirmed that the file contains the expected data, there may be still other issues. One of these is endianness, some machines and file formats encode a 4-byte long with least significant byte first, while others read and write the most significant byte first.

Other issue concerns the long data type you use. If your computer has a 64-bit architecture and you are using Linux, long is a 64-bit value, and your structure becomes 20 bytes long instead of 12.

Edit:

To read big-endian longs on a litte-endian machine like yours, you should read de data byte-by-byte and build the longs from them manually:

// Read 4 bytes
unsigned char buf[4];
fread(buf, 4, 1, pFile);
// Convert to long
my_record.a = (((long)buf[0]) << 24) | (((long)buf[1]) << 16) | (((long)buf[2]) << 8) | ((long)buf[3]);

edited May 20 '12 at 08:44

answered May 20 '12 at 08:17

buc

6,268
1
34
51

I am using 64 bit machine with windows 7 . Okay so i opened it in Hex editor and if one box is equal to one byte then i have 1292 bytes in the file with first 12 bytes being 00 00 00 05 00 00 03 E8 01 00 00 09 When i posted my data here the spaces and few symbols got ommitted , Endian might be another issue how should i check it and secondly how to write integers into file so that it shows these symbols instead of actual values – May 20 '12 at 08:27
and it should have 1292 bytes because there are 12 bytes for header and there there is sample of size 320 each of 4 bytes so totally it become 1292 – May 20 '12 at 08:30
sorry data should be 5 1000 1280 and 9 i believe – May 20 '12 at 08:31
From this you can see that the problem is byte-endianness. 00 00 00 05 is 0x00000005 in hexadacimal, or 5 in decimal if you read it in big-endian order, but if you reverse the bytes, you get 0x05000000 in hexadecimal which is 83886080 in decimal. The same applies to the other values. – buc May 20 '12 at 08:38
By the way, your expectation is wrong for the third value, your data is 0x00000005, 0x000003E8, 0x0100, 0x0009 which equals to 5, 1000, 256 and 9, respectively. – buc May 20 '12 at 08:46
Thanks a lot brother so the problem here is when i read the file it reads it into little endian because of my machine . So I shall write the file by changing endian to big endian . And one more query i can read first 12 bytes and that is valid . now next 4 bytes is supposed to be a decimal number ( float ) what i have in hex editor is 41 8F 32 2A , then next 4 bytes are 41 09 EA 93 . What these two would be in decimal point of float numbers and how to convert them . Thanks a lot i am marking your answer as correct – May 20 '12 at 08:48
Floats are a bit trickier, they cannot be easily converted from bytes to float using simple bit operations. I suppose you to read 4 bytes to the byte array, then reverse the order of bytes in the array. After this you can utilize some nasty pointer casts to get the float value from the byte array: `float f = *((float*)buf);` – buc May 20 '12 at 08:57
hmm thanks and btw the code you gave for converting to long instead of converting the 4 bytes to 5 , converts it to 538976261 – May 20 '12 at 09:09
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/11479/discussion-between-faisal-fayyaz-and-buc) – May 20 '12 at 09:12

understanding format of file

2 Answers2