Notes:
I have a django model on PostgreSQL having:
raw_data = models.TextField(_("raw_data"), default='')
It just storing some raw data which can be 1K - 200K.
I have 50 millions rows.
Req:
- I need to decrease size of data in database.
Questions:
How can I tell what consume most data over all database ?
Should I use a string compression before storing the data ?
2.1 I saw here: Text compression in PostgreSQL that it get compressed anyway, is that true ?
2.2 I did some python code compression, I am not sure if changing bytes to string type can cause in lossing data:
def shrink_raw_data(username):
follower_data = get_string_from_database()
text = json.dumps(follower_data).encode('utf-8') # outputs as a bytes
# Checking size of text
text_size = sys.getsizeof(text)
print("\nsize of original text", text_size)
# Compressing text
compressed = str(zlib.compress(text, 9))
# store String in database
# Checking size of text after compression
csize = sys.getsizeof(compressed)
print("\nsize of compressed text", csize)
# Decompressing text
decompressed = zlib.decompress(compressed)
# Checking size of text after decompression
dsize = sys.getsizeof(decompressed)
print("\nsize of decompressed text", dsize)
print("\nDifference of size= ", text_size - csize)
follower_data_reload = json.loads(decompressed)
print(follower_data_reload == follower_data)
2.3 since my data is stored as a string in the db, is this line "str(zlib.compress(text, 9))" VALID ?