I made my Django models and after inserting a test/dummy record to my PostgreSQL database I've realized that my data is quite large for each record. The sum of the data in all fields will be around 700 KB per record. I am estimating I will have around five million records so this will get really large around the 3350 GB mark. Most of my data is big JSON dumps (around 70+ KB for each field).
I am unsure if PostgreSQL will auto compress my data when handling through Django framework. I was wondering whether I should compress my data before entering it into the database.
Questions:
Does PostgreSQL auto compress my string fields using some x
compression algorithm when using Django model field type TextField
?
Should I not rely on PostgreSQL and just compress my data beforehand and then enter it into the DB? If so, which compression library should I use? I already tried zlib
in Python and seems great, but, I've read that there is gzip
library as well and I am confused which would be the most effective (in terms of compression and decompression speed as well as the percentage of compression).
EDIT: I was reading up on this Django snippet for CompressedTextField which sparked my confusion regarding which compression library to use. I saw a few people use zlib
while some used gzip
.
EDIT 2: This stackoverflow question says that PostgreSQL does compression of string data automatically.
EDIT 3: PostgreSQL uses pg_lzcompress.c for compression which is a part of the LZ compression family. Is it safe to assume that we don't need to use some other form of compression (zlib
or gzip
) on the TextField
itself since it will be of datatype text
(variable length string) in the DB itself?