-1

Can someone please help why the table is taking too much time to write when table is very small enter image description here

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Umer
  • 25
  • 5

1 Answers1

0

As advised here, you shouldn't partition on a column that has high cardinality (number of unique values). As can be seen in the screenshot, the orderDate column has 753 unique values. Under the covers that means 753 folders have to be created, and each folder would have on average ~1.2 records in a parquet file (assuming equal date distribution).

You should consider extracting the month and year, or just the year value from the orderDate column, and partition on that.

o_O
  • 341
  • 3
  • 14
  • This was the sample file that we used to test. The actual data size is more than 800 GB. Roughly 400 GB for each year and we need to load last 5 years of data. What do you recommend now – Umer Jan 22 '23 at 11:22
  • Well, 5 years worth of data makes it even worse. That means 1825 partitions could be created. You should consider partitioning by month +year, or just year; depending upon your query patterns – o_O Jan 22 '23 at 13:31