1

I have a specific binary? file format containing datas about the configuration used to take a picture with a custom camera. This file format is named DAI and contains for example values of offset/gain/etc...

I am using a black-box script in java to turn this file into a .csv and I want to perform the same thing in Matlab. I've got a config file describing in ascii format how this file is built (name of the field, type of the data, first_word, last_word, low_bit, high_bit). For example I know that the first field in the DAI file will be : spare1; PCHAR; first_word=0; low_bit=0; high_bit=7

But right now I have no clue of how to use this information. My first thought were to fopen() the file and use fread() to read the binary data from the file and turn it into the format I want but I don't know how to use the values of "last_word,high_bit,..." to do so. I have a limited understanding of binary files.

To sum up everything :

file.dai contains datas / file.cfg contains the structure :

mband_1_start_line; PCHAR;  first_word=12;  low_bit=6;          high_bit=15
mband_1_length;     PCHAR;  first_word=12;  low_bit=0;          high_bit=5
mband_1_gain;       PCHAR;  first_word=13;  low_bit=0;          high_bit=7
mband_1_offset;     PCHAR;  first_word=13;  low_bit=8;  last_word=14;   high_bit=7

and I want to recover the datas corresponding to the fields like mband_1_offset.

If someone can help me to figure the good way of doing that I will be very thankful !

[EDIT : SOLVED] So thanks to your very helpful help I've manage to get the values for each field even when the header changes !! Here's the final code :

Here's the final code :

...code to retrieve the content of the .cfg file....
%% Open and read the DAI file
fid = fopen(dai_file,'r','l');

% First thing is to skip the header
% We read a first time the file
dat=fread(fid,inf,'*uint8');
% We search for the position of the end of the header : NUL NUL ETX
% In decimal it gives :
skip = findstr(dat',[000,000,003]);

% We define the wordsize : 2 bytes (2 words)
wordsize = 2;

% We rewind the file to start over to get the values for each field
frewind(fid);

% We initiate the structure camdat containing the datas of the camera
camdat=struct;

% We start the loop for each field of the layout config file
for ct = 1:length(layout)
    % Defining the words/bits
    first_word = layout{ct,3};
    last_word = layout{ct,5};
    low_bit = layout{ct,4};
    high_bit = layout{ct,6};
    % We position to the "skip value + the position of the first_word in bytes"
    fseek(fid,skip+first_word*wordsize,-1);
    % We compute the number of words (last - first +1)  
    datasize=last_word-first_word+1;
    % We read the datas as uint16 (words are 16bits)
    data=fread(fid,datasize,'*uint16');
    % We convert it to bits
    % Case of 1 word
    bits=bitget(data(1),[1:16]);
    % Case of 2 words
    if length(data) > 1
        bits=[bits,bitget(data(2),1:16)]; 
        high_bit = high_bit+16;
    end
    % We take only the bits that define the field (between low_bit and
    % high_bit)
    bits_used = bits(low_bit+1:high_bit+1);
    % We convert the bits to dec
    data = sum(bits_used.*uint16(2).^uint16([0:length(bits_used)-1]));
    % We store it in the camdat.field struct
    camdat.(layout{ct,1})=data;
end
% We close the DAI file
fclose(fid);
% Displaying for test
camdat
  • The first stage is to decode what the cfg file means. The second stage will then be to read in the file in MATLAB. I expect the contents of the .csv would be very useful (but you don't show them) compared to the binary file. But at a guess I'd say it means that the length is stored in bits 0-5 of the 12th word; start line is in bits 6-15 of the 12th word; gain in bits 0-7 of the 13th word, etc. You'll need to know the word size, I'd guess at 16bit, but 32bit or 64bit are likely. This is where comparing the .csv file to the binary file will come in useful. – Justin Aug 03 '17 at 13:42
  • Hi Justin, Thanks for your answer ! I've made a .zip file with all the files : [link](https://ufile.io/7bcib) so you can see what is inside both the .csv and the .cfg file. I know that all the words are 16bits (it was written in the header of the .cfg). So I'll give a shot to write something done and come back to you ! Thx a lot – Tastro Labe Aug 07 '17 at 12:19

3 Answers3

0

Oke, so you get information about the layout of the file.

I would first store this in a more accessabel format

layout{1,1} = 'mband_1_start_line';
layout{1,2} = 'PCHAR';
layout{1,3} = 12;
layout{1,4} = 6;
layout{1,5} = 12;
layout{1,6} = 15;

Then you loop over the layout

wordsize = 2; %bytes / word
fid = fread(filename,'r','l')
camdat=struct;
for ct = 1:size(layout,1)
    fseek(fid,-1,layout{1,3}/wordsize)    %go to byte position
    datsize=layout{1,5}-layout{1,3}+1;    %number of words
    data=fread(fid,datsize,'*uint16')     %get words
    bits=bitget(data(1),[1:16]);          %convert to bits
    for ct = 2:datasize
        bits=[bits,bitget(data(ct),[1:16])]; 
    end
    bits = bits(layout{1,4}:(datasize-1)*16+layout{1,6};%get bits
    data = sum(bits.*uint16(2).^uint16([0:(length(bits)-1)])) %convert back
    camdat.(layout{1,1})=data;            %store
end
fclose(fid)

There will be problems with values that are longer than 16 bits ofcourse.

If the wordsize is different, you can change it to 4 for 32 bit, or 8 for 64 bit, but then you have to also change that in the loop.

Gelliant
  • 1,835
  • 1
  • 11
  • 23
  • Hi Gelliant ! Thx for your answer, I'll try to write something with the help you provided. I forgot to link the files last time [here](https://ufile.io/7bcib) in case you want to see directly what I was speaking about. – Tastro Labe Aug 07 '17 at 12:26
0

So I've been using your help to figure a way to do what I wanted. The idea is to go to the bytes of the "first_word", take the bits between the first and last word (and low_bit and high_bit), turn them into decimals. With your code I've done the following that gives results but not the one I was waiting for (in the .csv) (attached file).

First I'm not sure I'm handling well the case where the last_word is not the same as the first_word.

Then I'm not sure that my fseek() sends me at the correct bytes of the file...

%% Name of the files
%% Open and read the .cfg file      
%% Open and read the DAI file
...So here I've got my .cfg opened and store in layout{i,j}

wordsize = 2; %bytes / word
fid = fopen(dai_file,'r','l');

camdat=struct;

for ct = 1:length(layout)
    first_word = layout{ct,3};
    last_word = layout{ct,5};
    low_bit = layout{ct,4};
    high_bit = layout{ct,6};
    fseek(fid,first_word*wordsize,-1); %go to bytes
    datasize=last_word-first_word+1;    %number of words
    data=fread(fid,datasize,'*uint16');     %get words
    bits=bitget(data(1),[1:16]);          %convert to bits
    if length(data) > 1                     % case of 2 words
        bits=[bits,bitget(data(2),1:16)]; 
        high_bit = high_bit+16;
    end
    bits = bits(low_bit+1:high_bit+1);%get bits
    data = sum(bits.*uint16(2).^uint16([0:length(bits)-1])); %convert back
    camdat.(layout{ct,1})=data;            %store
end
camdat
fclose(fid);

So if you have ideas of where I'm wrong, I'll be very grateful !!!!

  • The extra information helps. It is clear that dai contains 2-byte words. Most entries make sense. They are just one byte: 0-7 8-15. There are a few weird ones. How about you first try to read a few of the normal ones and see if they are correct? Do you also know what the values should be in the .dai file? – Gelliant Aug 08 '17 at 07:12
  • For example `frame_length` is suddenly a 9 bit value (8-17). – Gelliant Aug 08 '17 at 07:18
  • Hi Gelliant ! For example I'm sure that : bias1=74 bias2=210 bias3=129 bias4=93 – Tastro Labe Aug 08 '17 at 08:46
  • To test things, I do a `fread(fid,16 bytes (normally 8 words),'*uint16')' So going on the corresponding words : bias1 is the 4th word bits : 1to8 and bias2 9to16. 4th word will be 8 bytes from the beginning of file : with fread i get the value : 17473 which gives in binary : 1000 0010 0010 0010 So what I understand is that bias1 is 1000 0010 and bias2 is 0010 0010, but it gives me in dec : 130 and 34 – Tastro Labe Aug 08 '17 at 08:57
  • But there's 2 strange things : First the fields with weird number of bits (frame_length for example) Second : the fact that there's a readable header in the .dai file when you open it with notepad. So it has to modify the way we handle the beginning of the bytes we need no ? – Tastro Labe Aug 08 '17 at 09:00
0

My approach in this case is to find the part of the file that matches your data.

fid = fopen('dai_file.dai','r','l');
dat=fread(fid,inf,'*uint8');
findstr(dat',[74,210,129,93]);
>> 891        1159        1427        1695  ....

Strange enough this happens 100 times.

If byte 891 is right than bios_1 is NOT in the 4th word from bit 0 to 7, but in the 445th word bit 0 to 7.

Let's try

fid = fopen(dai_file,'r','l');
fseek(fid,445*2,-1)
data=fread(fid,1,'*uint16');
bits=bitget(data(1),[1:16]);   
bits = bits(1:8);
data = sum(bits.*uint16(2).^uint16([0:7]))
>> data = 74

Yep, there it is. So I would suggest to add 441 to each word entry and see if it works.

Gelliant
  • 1,835
  • 1
  • 11
  • 23
  • Hey ! Thanks for the answer. That was my approach too, so I worked with different .dai and you were right, there's a value to skip before decoding the file. In the files I joined it was 441, I have other datas where it's 445. Right now i'm figuring out a way to retrieve this size of the header automatically. By the way the fact it happens 100 times is maybe normal, in one picture you have 100 frames, so maybe it stores 100times the parameters of the camera (since each frame have the same configuration). – Tastro Labe Aug 08 '17 at 12:40
  • Thanks a lot for your answers, I've come up with a code working each time by searching for the end of the header in dec. You've been very helpful thanks a lot again !!! – Tastro Labe Aug 08 '17 at 16:40
  • I've edited my main post so you can see the final code ! – Tastro Labe Aug 08 '17 at 16:46
  • Great! glad I could help. Your code looks very clear. – Gelliant Aug 09 '17 at 11:15