0

I'm working with a database column that has already been implemented as a varchar(MAX). I would like to compress the string data we normally save in this field to save space and increase load times when working with the database.

I used the zlib library in python to do something like this:

compressed_string = zlib.compress(original_string.encode()) # The data type is now converted to bytes
database_string   = str(compressed_string)                  # Converts the compression data into a string

I can save this data to the database within the pre-existing vachar(MAX) column when I use str(), but I can't find a way to get back to my starting point after casting to a string.

If I do something like compressed_string.decode() to try and get the string representation of the compression bytes, I get errors saying 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte I also tried using ASCII encoding to no avail.

My end goal would be to get something like this to happen:

original_string = zlib.decompress(database_string).decode() # database_string needs to be a bytes-like object though

Does anyone know how I can store the bytes-like string representation of the compression and then reverse the process later?

Andew
  • 321
  • 1
  • 9
  • If this isn't a great approach, could anyone recommend other libraries for string compression to accomplish the task? Or would I need to create my own method of string compression? – Andew Oct 04 '22 at 17:10
  • 2
    Put simply, `varchar(max)` is the wrong data type for this. Depending on your database's default collation, or the column's specific collation, it will result in data corruption. If you must store compressed data in the database (and I'd question your requirement for this) you should use `varbinary(max)`. – AlwaysLearning Oct 04 '22 at 21:20

0 Answers0