5

I have a PostgrSql 14 installed and I want to compress some of my data to save some of a disk space. The data is audio files (1kb - 5mb), converted to a base64 strings. I created 3 tables:

CREATE TABLE t_uncompressed (
    file_name VARCHAR(50) NOT NULL PRIMARY KEY,
    file_size BIGINT,
    raw_data       TEXT
);


CREATE TABLE t_lz4 (
    file_name VARCHAR(50) NOT NULL PRIMARY KEY,
    file_size BIGINT,
    raw_data       TEXT COMPRESSION lz4
);


CREATE TABLE t_pglz (
    file_name VARCHAR(50) NOT NULL PRIMARY KEY,
    file_size BIGINT,
    raw_data       TEXT COMPRESSION pglz
);

then I inserted my data into these tables. I checked where data was compressed and I got that 403 rows were compressed with lz4 and only one with a pglz.

SELECT count(*) total, 
count(*) FILTER ( WHERE pg_column_compression(raw_data) NOTNULL) compressed, 
'lz4' compr_type
    FROM t_lz4
UNION
SELECT count(*) total, 
count(*) FILTER ( WHERE pg_column_compression(raw_data) NOTNULL) compressed, 
'pglz' compr_type
    FROM t_pglz;

 total | compressed | compr_type 
-------+------------+------------
   738 |          1 | pglz
   738 |        403 | lz4
(2 rows)

What seems to be weird for me is that the size of all three tables is the same! Okay, about uncompressed table and the pglz one, but why _lz4 table has the same size?

I'm getting table sizes this way:

SELECT schemaname || '.' || tablename full_tname
    , pg_size_pretty(pg_total_relation_size('"' || schemaname || '"."' || tablename || '"')) total_usage
    , pg_size_pretty(pg_relation_size('"' || schemaname || '"."' || tablename || '"')) data_size
    , pg_size_pretty((pg_total_relation_size('"' || schemaname || '"."' || tablename || '"') -
                      pg_relation_size('"' || schemaname || '"."' || tablename || '"') -
                      pg_indexes_size('"' || schemaname || '"."' || tablename || '"')))
    AS TOAST
    FROM pg_catalog.pg_tables
    WHERE tablename ~ 't_';

  full_tname         | total_usage | data_size | toast  
---------------------+-------------+-----------+--------
 t_lz4               | 338 MB      | 80 kB     | 338 MB
 t_pglz              | 338 MB      | 80 kB     | 338 MB
 t_uncompressed      | 338 MB      | 80 kB     | 338 MB
 (3 rows)

The default compression used in database is pglz, maybe this info is significant...

postgres=# SHOW default_toast_compression ;
 default_toast_compression 
---------------------------
 pglz
(1 row)
Frankie Drake
  • 1,338
  • 9
  • 24
  • 40
  • Compression is effective when the input data(often text) has repetitions. Only way to compress Audio/Video size is by lossy(audio/video) compression algorithms. You're better off storing that data on plain file-system & their paths in DB-records. Often when you compress random binary data(i.e. audio/video/binary files to text compression algorithms) will bloat file size a bit & waste time (compression/decompression) without adding any meaningful value to the process. – जलजनक Mar 04 '22 at 11:30

1 Answers1

1

You didn't say what kind of audio files. Most audio file formats are already compressed, and won't compress further. base64 encoding them does mean they should compress a little bit, to compress out the expansion caused by encoding them. LZ without Huffman encoding is particularly bad at compressing this type of expansion. Which is what you are seeing by looking at the size of the tables: The compression is futile.

And the compression implementations used by PostgreSQL check for futility and give up compressing things that don't seem very compressible. That is what you are seeing with pg_column_compression().

jjanes
  • 37,812
  • 5
  • 27
  • 34