1

I need to get big Boolean arrays or BitSets from Java into Python via a text file. Ideally I want to go via a Base64 representation to stay compact, but still be able to embed the value in a CSV file. (So the boolean array will be one column in a CSV file.)

However I am having issues to get the byte alignment right. Where/how should I specify the correct byte order?

This is one example, working in the sense that it executes but not working in that my bits aren't where I want them.

Java:

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.util.Base64;
import java.util.Base64.Encoder;
import java.util.BitSet;

public class basictest {

    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Encoder b64 = Base64.getEncoder();
        String name = "name";
        BitSet b = new BitSet();
        b.set(444);
        b.set(777);
        b.set(555);
        byte[] bBytes = b.toByteArray();
        String fp_str = b64.encodeToString(bBytes);
        BufferedWriter w = new BufferedWriter(new FileWriter("out.tsv"));
        w.write(name  + "\t" + fp_str + "\n");
        w.close();
    }

}

Python:

import numpy as np
import base64
from bitstring import BitArray, BitStream ,ConstBitStream
filename = "out.tsv"
with open(filename) as file:
    data = file.readline().split('\t')
b_b64 = data[1]
b_bytes = base64.b64decode(b_b64)
b_bits = BitArray(bytes=b_bytes)

b_bits[444] # False
b_bits[555] # False
b_bits[777] # False
# but
b_bits[556] # True
# it's not shifted:
b_bits[445] # False
meow
  • 925
  • 7
  • 22

3 Answers3

1

I am now reversing the bits in every byte using https://stackoverflow.com/a/5333563/1259675:

numbits = 8
r_bytes = [
    sum(1<<(numbits-1-i) for i in range(numbits) if b>>i&1)
    for b in b_bytes]
b_bits = BitArray(r_bytes)

This works, but is there a method that doesn't involve myself fiddling with the bits?

meow
  • 925
  • 7
  • 22
0

If:

  • the maximum bit to set is "sufficiently small".
  • and the data, you want to encode doesn't vary in size too much.

..then one approach can be:

  • Set max (+ min) significant bit(s in java) .
  • and ignore them in python .

, then it c(sh!)ould work without byte reversal, or further transformation:

// assuming a 1024 bit word
public static final int LEFT_SIGN = 0;
public static final int RIGHT_SIGN = 1025; //choose a size, that fits your needs [0 .. Integer.MAX_VALUE - 1 (<-theoretically)] 

public static void main(String[] args) throws Exception {
    ...
    b.set(LEFT_SIGN);
    b.set(444 + 1);
    b.set(777 + 1);
    b.set(555 + 1);
    b.set(RIGHT_SIGN);
    ...

and then in python:

# as before ..
b_bits[0] # Ignore!
b_bits[445] # True
b_bits[556] # True
b_bits[778] # True
b_bits[1025] # Ignore!;)

Your convenience (= encoding) 'd be the (maximum) "word length" ... with all its benefits and drawbacks.

Community
  • 1
  • 1
xerx593
  • 12,237
  • 5
  • 33
  • 64
0

We can use the bitarray package from python for this particular usecase.

from bitarray import bitarray
import base64

with open(filename) as file:
    data = file.readline().strip().split('\t')

b_b64 = data[1]
b_bytes = base64.b64decode(b_b64)
bs = bitarray(endian='little')
bs.frombytes(b_bytes)
print bs