I have a column in postgres DB of data type bytea and encoding is of compressed_json. How can I read the values of this column and get a proper json back in Python?
Asked
Active
Viewed 979 times
-2
-
The JSON is compressed how? How are you accessing the database in Python? – Adrian Klaver Jul 19 '20 at 21:19
-
I am accessing the postgres db using Psycopg2 in my python code. Also, I am not sure but I think the compressed json is basically just a gzipped json – Sumeet Ambastha Jul 19 '20 at 21:37
-
Well you will need to know for sure what compression is being used or you will not be getting far. Can you find out how the data is being entered into the column? – Adrian Klaver Jul 20 '20 at 00:43
-
Sure. I just found it. It is being compressed using a GZipOutputStream in java. Not sure if this helps though. – Sumeet Ambastha Jul 20 '20 at 02:05
2 Answers
0
But the method gzip.compress and decompress is not showing. Maybe its a version issue with my gzip module. Anyways, I was able to decompress using below way :
Convert and store the column as byte array :-
payload = bytearray(row[0])
For decompressing, used below code snippet :
fileobj = cStringIO.StringIO(payload) gzf = gzip.GzipFile('dummy-name', 'rb', 9, fileobj) decomprJson = gzf.read()

Sumeet Ambastha
- 15
- 4
-1
I don't have time at the moment to give a complete working example. Instead I will provide an outline:
- Read column from database
data = cur.fetchone()[0]
- Use Python gzip module.
json_data = gzip.decompress(data)
Where json_data is a bytes object.
- Use Python json module to read json_data.
UPDATE. Example:
create table js_compressed(js_fld bytea);
js_str = b'[{"one": 1, "two": 2}, {"three": 3, "four": 4}]'
js_cmp = gzip.compress(js_str)
con = psycopg2.connect("dbname=test user=aklaver host=localhost")
cur = con.cursor()
cur.execute("insert into js_compressed(js_fld) values(%s)", (psycopg2.Binary(js_cmp,),))
cur.execute("select js_fld from js_compressed limit 1")
js_fld = cur.fetchone()[0]
dc_fld = gzip.decompress(js_fld).decode()
json.loads(dc_fld)
[{'one': 1, 'two': 2}, {'three': 3, 'four': 4}]

Adrian Klaver
- 15,886
- 2
- 17
- 28