1

I have a file, font_file.bdf, and need to get the characters contained in it as numpy arrays where each element is one pixel.

Here's the snippet of that file which defines the '?' character:

STARTCHAR question
ENCODING 63
SWIDTH 1000 0
DWIDTH 6 0
BBX 5 7 0 0
BITMAP
70
88
08
10
20
00
20
ENDCHAR

I researched .bdf files to understand how they encode data. Basically, it's a bitmap with bit-depth of 1. I found a pillow module, PIL.BdfFontFile, which can interpret bdf files. After experimenting with this module a while I was able to get a PIL image for each of the characters in the font and save them to see that it is working like so:

from PIL.BdfFontFile import BdfFontFile

fp = open("font_file.bdf", "r")
bdf_file = BdfFontFile(fp)
bdf_file.compile()
char = '?'
_, __, bounding_box, image = bdf_file[ord(char)]
image.save(char + ".png")

The saved image looks like the following: Question Mark. and from looking at its properties it has a bit-depth of 1, which makes sense. (I'm not sure why it seems inverted, but I could do that kind of manipulation with numpy if still needed.)

Once I had that, I tried to convert to a numpy array:

print numpy.array(image, dtype=numpy.int)

which gave me an array that no longer seems to represent the corresponding character any longer:

[[1 1 1 1 1]
 [0 1 0 1 1]
 [1 1 1 1 1]
 [1 1 1 1 0]
 [1 0 1 0 1]
 [1 0 1 1 1]
 [0 1 1 1 1]]

I was hoping for something that looked more like this:

[[0 1 1 1 0]
 [1 0 0 0 1]
 [0 0 0 0 1]
 [0 0 0 1 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 1 0 0]]

Worst case-scenario, I could make an algorithm myself that converts the data in the PIL image to a numpy array, but I feel like there has to be an easier way given my past experience with converting between PIL Images and numpy arrays (It's usually quite straight-forward.)

Any ideas about how to get the PIL image to convert to a numpy array properly or another solution to my problem would be appreciated.

2 Answers2

0

It turns out the unexpected behavior I was seeing was due to a bug in PIL as described in this SO question: Error Converting PIL B&W images to Numpy Arrays.

So the key to solving my problem was to convert the image to grayscale before creating the numpy array.

My final solution with doing a small numpy conversion into the described format was as follows:

fp = open("font_file.bdf", "r")
bdf_file = BdfFontFile(fp)
bdf_file.compile()
char = '?'
_, __, bounding_box, image = bdf_file[ord(char)]
print numpy.array(image.convert('L')) / 255

which gave me this:

[[0 1 1 1 0]
 [1 0 0 0 1]
 [0 0 0 0 1]
 [0 0 0 1 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 1 0 0]]
Community
  • 1
  • 1
0

For me to get @drake-mossman's answer to work, I had to modify the first line to read the file in byte format:

fp = open("font_file.bdf", "rb")

Which unfortunately means that the BdfFontFile script currently doesn't support unicode characters (or any code points past 255).

StupidWolf
  • 45,075
  • 17
  • 40
  • 72