I'm trying to extract data from a dump of a Paradox database. It contains two fields with rich text stored as a binary blob that I'm having troubles decoding. In the middle of the blob there is the plain text, but it is surrounded by two blocks of binary containing formatting information for the text whose length is varying. So far I could understand some of the structure, but it's not enough to reliably decode the whole block or at least figure out how long it is to skip to the next one.
What I have so far:
- Ints inside the block are in Little Endian format
- the blob starts with a sequence of 44 bytes
- the first 4 bytes seem to be always
07 00 00 00
- the following 4 bytes contain the length of the text in bytes
- the purpose of the remaining 36 bytes isn't clear yet
- the first 4 bytes seem to be always
- then follows the unformatted text, whose length is given above
- the remainder of the blob contains formatting information and has a variable length
- no idea what the first 25 (or 26) bytes are for
- they are followed by a series of formatting markers that look like this:
A0 03 00 00 03 80
. Their meaning is: starting at character0x03A0
, apply style number03
- then there are 3 (or 4) bytes specifying the number of styles
- after that follow style descriptions. Each is 54 bytes long, the name of the font is visible there in plain text.
- the block ends with 26 bytes of unknown purpose
A person who has experience with the Paradox file format told me that this rich text blob probably isn't Paradox-specific. Could it be a format that Windows is using to store data in Richedit fields? Does anybody else recognize something about the format?