6

AWS Redshift has recently released their own new encoding format AZ64, for which they say:

Compared to ZSTD encoding, AZ64 consumed 5–10% less storage, and was 70% faster

When I use an ANALYZE COMPRESSION my_table I still receive ZSTD as an encoding format for all of its columns.

So is it really recommended as an encoding format over ZSTD? Shall I naively prefer AZ64 whenever possible to use it?

Vzzarr
  • 4,600
  • 2
  • 43
  • 80
  • 3
    Same here. I also run ANALYZE COMPRESSION on multiple tables after AZ64 encoding was released, and it was not suggested to any of columns. Note that AZ64 can be used only on numeric, date and timestamp datatypes. – Edgars T. Nov 22 '19 at 10:07
  • @EdgarsT. I got a reply from AWS Support ;) – Vzzarr Nov 22 '19 at 10:59

3 Answers3

6

I got a reply from AWS Support on this question:

TL;DR

About your question preferring AZ64 instead of ZSTD were possible, yes you can do it.

Given AZ64 provide better performance compared to ZSTD

For further exaplanation:

Yes, AZ64 is better than ZSTD. It has comparable compression, when compared with ZSTD but greatly better performance, which is something you would have already learned using. As of now ANALYZE COMPRESSION command doesn't support AZ64, also I don't have an ETA on when AZ64 will be available with ANALYZE COMPRESSION. I will suggest you to keep an eye on

for any updates on AWS Redshift. I have verified this with the internal service team.

ANALYZE COMPRESSION is an advisory tool it recommends the optimal column encoding depending upon the columns.

Vzzarr
  • 4,600
  • 2
  • 43
  • 80
2

When ZSTD first came out it also took a while to be added to the analyze compression command.

ZSTD can be used on any datatype although some won't benefit from it as much as others. You can naively apply it to everything and it works fine.

AZ64 can only be applied to these datatypes:

SMALLINT
INTEGER
BIGINT
DECIMAL
DATE
TIMESTAMP
TIMESTAMPTZ

I ran an experiment to test the compression factor. I was surprised to discover it doesn't always make things smaller.

Steps

  1. Generated the create table DDL for the original table
  2. Changed the name of the table and the encoding for valid columns
  3. created the table Inserted into the new table from the old table ran
  4. VACUUM FULL <tablename> TO 99 PERCENT for both old and new table
  5. ran ANALYZE <tablename> for both old and new table

Query I used to check column sizes borrowed from https://stackoverflow.com/a/33388886/1335793

Results

enter image description here

  • The id column is a primary key, so has a very large cardinality, perhaps that helps?
  • The sort_order column has values in the range 0-50 with more values closer to 0
  • The created_at timestamp ranges over many years with more data in recent times
  • completed_step is similar to sort order but the median is closer to 0

Edit: I haven't done any performance comparison so this is only part of the story. Overall the size of the table is smaller, even if some fields weren't.

Community
  • 1
  • 1
Davos
  • 5,066
  • 42
  • 66
  • 1
    interesting analysis, even though I reckon the big advantage of AZ64 over ZSTD is not in the compression ratio but in the fastness of retrieving results: https://aws.amazon.com/about-aws/whats-new/2019/10/amazon-redshift-introduces-az64-a-new-compression-encoding-for-optimized-storage-and-high-query-performance/ – Vzzarr Dec 13 '19 at 09:46
  • 1
    @Vzzarr I hope so, I am yet to do any performance tests with this. It will need careful preparation. I've edited the answer to make it clear that I haven't done performance testing. – Davos Dec 13 '19 at 09:53
0

As Davos indicated, AZ64 may provide a significant reduction in storage used.

I did a basic test with identical datasets in two tables. One using ZSTD, and the other using AZ64 where possible. I saw no gains in performance. Overall, I saw the average runtime of queries take longer on the tables using AZ64.

The figure below shows the total execution time taken across all queries. AZ64 is substantially slower. This was for my use case of Redshift, there may be situations where AZ64 is actually faster. But I was not able to find any.

Chart comparing AZ64 vs ZSTD execution time

Complete information is available here on my blog: http://www.hydrogen18.com/blog/redshift-az64-performance-vs-zstd.html

Eric Urban
  • 3,671
  • 1
  • 18
  • 23