I obtain messages in async cycle and from each message I parse row
which is dictionary. I would like to write these rows into parquet. To implement this, I do the following:
fields = [('A', pa.float64()), ('B', pa.float64()), ('C', pa.float64()), ('D', pa.float64())]
schema = pa.schema(fields)
pqwriter = pq.ParquetWriter('sample.parquet', schema=schema, compression='gzip')
#async cycle starts here
async for message in messages:
row = {'A': message[1], 'B': message[2], 'C': message[3], 'D': message[4]}
table = pa.Table.from_pydict(row)
pqwriter.write_table(table)
#end of async cycle
pqwriter.close()
Everything works perfect, however the resulting parquet-file is about ~5 Mb size, whereas if I perform writing to csv-file, I have the file of ~200 Kb size. I have checked that data types are the same (columns of csv are floatt, columns of parquet are floats)
Why my parquet is much larger than csv with the same data?