-1

Hi I need to calculate the entropy of order m of a file where m is the number of bit (m <= 16).

So:

H_m(X)=-sum_i=0 to i=2^m-1{(p_i,m)(log_2 (p_i,m))}

So, I thought to create an input stream to read the file and then calculate the probability of each sequence composed by m bit.

For m = 8 it's easy because I consider a byte. Since that m<=16 I tought to consider as primitive type short, save each short of the file in an array short[] and then manipulate bits using bitwise operators to obtain all the sequences of m bit in the file. Is this a good idea?

Anyway, I'm not able to create a stream of short. This is what I've done:

public static void main(String[] args) {
    readFile(FILE_NAME_INPUT);
}

public static void readFile(String filename) {
    short[] buffer = null;
    File a_file = new File(filename);
    try {
        File file = new File(filename);

        FileInputStream fis = new FileInputStream(filename);
        DataInputStream dis = new DataInputStream(fis);

        int length = (int)file.length() / 2;
        buffer = new short[length];

        int count = 0;
        while(dis.available() > 0 && count < length) {
            buffer[count] = dis.readShort(); 
            count++;
        }
        System.out.println("length=" + length);
        System.out.println("count=" + count);


        for(int i = 0; i < buffer.length; i++) {
            System.out.println("buffer[" + i + "]: " + buffer[i]);
        }

        fis.close();
    }
    catch(EOFException eof) {
        System.out.println("EOFException: " + eof);
    }
    catch(FileNotFoundException fe) {
        System.out.println("FileNotFoundException: " + fe);
    }
    catch(IOException ioe) {
        System.out.println("IOException: " + ioe);
    }
}

But I lose a byte and I don't think this is the best way to proced.


This is what I think to do using bitwise operator:

int[] list = new int[l];
foreach n in buffer {
    for(int i = 16 - m; i > 0; i-m) {
        list.add( (n >> i) & 2^m-1 );
    }
}

I'm assuming in this case to use shorts. If I use bytes, how can I do a cycle like that for m > 8? That cycle doesn't work because I have to concatenate multiple bytes and each time varying the number of bits to be joined..

Any ideas? Thanks

  • If you're just calculating a summation, why are you keeping every single value in an array? – VGR Mar 03 '16 at 15:55
  • Thanks for the reply. I need to keep values in an array because I need to get all the subsequence of m bit and then calculate the probability of each of theese sequences. –  Mar 03 '16 at 16:02

1 Answers1

1

I think you just need to have a byte array:

public static void readFile(String filename) {
  ByteArrayOutputStream outputStream=new ByteArrayOutputStream();
  try {
    FileInputStream fis = new FileInputStream(filename);
    byte b=0;
    while((b=fis.read())!=-1) {
        outputStream.write(b);
    }
    byte[] byteData=outputStream.toByteArray();
    fis.close();
  }
  catch(IOException ioe) {
    System.out.println("IOException: " + ioe);
}

Then you can manipulate byteData as per your bitwise operations.

--

If you want to work with shorts you can combine bytes read this way

short[] buffer=new short[(int)(byteData.length/2.)+1];
j=0;
for(i=0; i<byteData.length-1; i+=2) {
  buffer[j]=(short)((byteData[i]<<8)|byteData[i+1]);
  j++;
}

To check for odd bytes do this

if((byteData.length%2)==1) last=(short)((0x00<<8)|byteData[byteData.length-1]]);

last is a short so it could be placed in buffer[buffer.length-1]; I'm not sure if that last position in buffer is available or occupied; I think it is but you need to check j after exiting the loop; if j's value is buffer.length-1 then it is available; otherwise might be some problem.

Then manipulate buffer.

The second approach with working with bytes is more involved. It's a question of its own. So try this above.

gpasch
  • 2,672
  • 3
  • 10
  • 12
  • we dont understand your question then. for m=9 you get 1 byte+1bit – gpasch Mar 03 '16 at 19:22
  • this expression (n >> i) & 2^m-1 is unclear but in words you want every m bits right? – gpasch Mar 03 '16 at 19:49
  • Why it's unclear? Yes this is what I want, sequences of m bit –  Mar 03 '16 at 19:57
  • Thanks for the help, your code give me this error `error: incompatible types: possible lossy conversion from int to short` –  Mar 03 '16 at 21:15
  • Mmh outOfBound exception in line `buffer[j]=(short)((byteData[i]<<8)|byteData[i+1])` –  Mar 03 '16 at 21:24
  • I thank you infinitely. The code works except for the last byte that is not considered if the array of bytes is odd. If the array of byte is {119=1110111, 97=1100001, 115=1110011} the array oh shorts is {30561=..., **0**}. It should not be 0 but 115, then we should add a padding to 0 in front of 115. I haven't the idea how to do, sorry. –  Mar 03 '16 at 22:01
  • Thanks, it works perfectly. You were very kind and very patient, Thank you again! –  Mar 04 '16 at 16:44