2

To work with bytea values in PostgreSQL, I usually am serializing to and deserializing from hex. This seems to be the preferred way. However, what is actually stored on the PostgreSQL server? Is it hex, or the unhexed binary? The reason I care is that hex is obviously going to take up double the space as unhexed binary. When I say unhexed binary, I mean the hex string "00", which is 2 bytes, is just "0", which is 1 byte, as unhexed binary.

The context is I have a Postgres database and a Scylla database that are storing the exact same data in almost the exact same format. However, the total space used by Postgres is almost exactly double the space used by Scylla. For Scylla, I don't encode binary as hex. I just send raw binary over the wire. I don't expect these two databases to use the exact same amount of space. But for PostgreSQL to use double the space is quite a lot of overhead, and the nearly exact doubling really makes me suspect data is being stored as hex and not actual binary on the server (since hex uses exactly double the space as actual binary).

CJ Cobb
  • 105
  • 5

1 Answers1

2

A bytea is stored in binary form, not hex encoded, which would be enormously wasteful. The hex representation is just the default text representation generated by the type output function.

I don't know Scylla, so I cannot explain the difference, but PostgreSQL has substantial overhead per row (23 bytes), and there is also some overhead per 8kB block.

You say in your comments that you measured the database size, which includes all the metadata and indexes. I suggest that you use pg_table_size to measure the table.

Note that PostgreSQL automatically compresses values if a table row would otherwise exceed 2000 bytes.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • I figured out the database size with this command: `SELECT pg_size_pretty( pg_database_size('dbname') );` – CJ Cobb Mar 25 '21 at 20:29
  • do you have a source that says `bytea` is stored in binary form and not hex? – CJ Cobb Mar 25 '21 at 20:31
  • I checked the database size again today (originally posted yesterday), and the size is not _exactly_ double anymore. 150GB in Scylla is about 275GB in PostgreSQL. – CJ Cobb Mar 25 '21 at 21:01
  • I have added some more about the size measurement. Sure I have a source for the information how `bytea` is stored: the PostgreSQL source, which is open. Implementation details are not documented. But it would be crazy to waste space by storing `bytea` in a different fashion. – Laurenz Albe Mar 26 '21 at 07:44