Amazon Redshift Tuning – 7 Golden Rules You Must Never Forget!

DWH Pro Admin

February 11, 2020

minutes reading time


The golden rules for Amazon Redshift optimization

Customize Workload Management

Workload management in Redshift means optimizing the queues. This includes the correct selection of queues, the assignment of queries, and the allocation of memory.

Although automated workload management and concurrency scaling have greatly improved workload management, workload management is a critical point in ensuring a scalable system (and one of the weak points of Redshift)

Since Redshift Workload Management is primarily based on queuing queries, very unstable runtimes can be expected if misconfigured. Consider switching from manual WLM to automatic WLM, in which queues and their queries can be prioritized.

Consider whether concurrency scaling (for read-only workload) or short query acceleration makes sense in your environment.

Use Column Encoding for compression

Consider compressing columns. This reduces the storage space and I/Os needed by queries.

Do not use Redshift for OLTP queries

Redshift is entirely unsuitable for this kind of workload. Not only because it is a column store but also because no indexes can be used (as in Teradata columnar).

Choose the proper method to distribute the data.

Use the DISTKEY to help Amazon Redshift join large tables.
Use EVEN if no suitable DISTKEY exists to avoid skew.
Copy lookups to all slices.

Maintenance of your Amazon Redshift statistics

Only if the statistics are correct will memory be reserved for the query plan created in the correct size.
If too much memory is reserved, the other queries in the same queue are missing and are delayed.
If too little memory is reserved, it is possible that the memory must be buffered. The query will then be much slower.
Execute the ANALYZE command regularly.

Release space regularly

As Rows’ space in Amazon Redshift is not automatically freed after a DELETE or UPDATE this must be done manually. This is done with the VACUUM FULL command. Space is released, and the rows are sorted by SORTKEY. This is regularly necessary to maintain performance.

Write better queries

Here are two examples:
Since we are dealing with a column store, you should always select only the required columns to minimize the I/Os.
Also, restrict your queries with a WHERE condition whenever possible to minimize the scans.

  • Avatar
    Joe Harris says:

    The advice in this post is substantially out of date.
    1. Use AutoWLM with query priorities
    2. Compression is applied automatically
    3. {correct!}
    4. Use DISTKEY if possible, use DISTSTYLE AUTO otherwise
    5. Redshift automatically runs ANALYZE in the background
    6. Redshift automatically runs VACUUM DELETE.
    7. {correct!}

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >