1

I have a question around the storage size calculation/estimation for a table to be loaded into HAWQ?

I have a 30MB table in HIVE, which I am trying to load using PXF into HAWQ, example: create table t2 tablespace data as select * from hcatalog.default.afs_trvn_mktscn_population;

The table in HAWQ is consuming 369MB of storage? irrespective of how many HAWQ segments I have and what the HAWQ DFS.replica factor or HDFS replication factor is? I my case even with 4 HAWQ segments or 1 HAWQ segment the size of table after loading comes our to be 369MB.

I can understand that the minimum block size is 128MB, so even 30MB will use 128MB minimum, but why > 300MB?

Can you share some information on this?

1 Answers1

1

Your Hive table is probably stored as ORC with Snappy compression while your HAWQ table isn't even compressed. You should use this in your HAWQ table:

with (appendonly=true, orientation=parquet, compresstype=snappy) 
Jon Roberts
  • 2,068
  • 1
  • 9
  • 11