1

Notes:

  • I have a django model on PostgreSQL having:

    raw_data = models.TextField(_("raw_data"), default='')

  • It just storing some raw data which can be 1K - 200K.

  • I have 50 millions rows.

Req:

  • I need to decrease size of data in database.

Questions:

  1. How can I tell what consume most data over all database ?

  2. Should I use a string compression before storing the data ?

    2.1 I saw here: Text compression in PostgreSQL that it get compressed anyway, is that true ?

    2.2 I did some python code compression, I am not sure if changing bytes to string type can cause in lossing data:

def shrink_raw_data(username):

    follower_data = get_string_from_database()
    text = json.dumps(follower_data).encode('utf-8')  # outputs as a bytes

    # Checking size of text
    text_size = sys.getsizeof(text)
    print("\nsize of original text", text_size)

    # Compressing text
    compressed = str(zlib.compress(text, 9))
    
    # store String in database

    # Checking size of text after compression
    csize = sys.getsizeof(compressed)
    print("\nsize of compressed text", csize)

    # Decompressing text
    decompressed = zlib.decompress(compressed)

    # Checking size of text after decompression
    dsize = sys.getsizeof(decompressed)
    print("\nsize of decompressed text", dsize)

    print("\nDifference of size= ", text_size - csize)

    follower_data_reload = json.loads(decompressed)

    print(follower_data_reload == follower_data)

2.3 since my data is stored as a string in the db, is this line "str(zlib.compress(text, 9))" VALID ?

flice com
  • 125
  • 7
  • 1
    This ought to be split up, since I count 4 distinct questions. | #1 has nothing to do with compression, django or strings. On the other hand it lacks information about which specific database you use, as that will likely determine how to obtain that information. – Dan Mašek Sep 24 '21 at 18:15

0 Answers0