The golden rules for Amazon Redshift optimization

Customize Workload Management

Workload management in Redshift means optimizing the queues. This includes the correct selection of queues, the assignment of queries and the allocation of memory.

Although automated workload management and concurrency scaling have greatly improved workload management, workload management is a critical point in ensuring a scalable system (and one of the weak points of Redshift)

Since Redshift Workload Management is primarily based on queuing queries, very unstable runtimes can be expected if configured incorrectly. Consider switching from manual WLM to automatic WLM, in which queues and their queries can be prioritized.

Consider whether concurrency scaling (for read-only workload) or short query acceleration makes sense in your environment.

Use Column Encoding for compression

Consider compressing columns. This reduces the storage space required and also reduces the I/Os required by queries.

Do not use Redshift for OLTP queries

Redshift is completely unsuitable for this kind of workload. Not only because it is a column store, but also because there are no indexes that can be used (as in Teradata columnar).

Choose the right method to distribute the data

Use the DISTKEY to help Amazon Redshift join large tables.
Use EVEN if no suitable DISTKEY exists to avoid skew.
Copy lookups to all slices.

Maintenance of your Amazon Redshift statistics

Only if the statistics are correct will memory be reserved in the correct size for the query plan created.
If too much memory is reserved, the other queries in the same queue are missing and are delayed.
If too little memory is reserved, it is possible that the memory must be buffered. The query will then be much slower.
Execute the ANALYZE command regularly.

Release space regularly

As Rows’ space in Amazon Redshift is not automatically freed after a DELETE or UPDATE this must be done manually. This is done with the VACUUM FULL command. Space is released and the rows are sorted by SORTKEY. This is regularly necessary to maintain performance

Write better queries

Here are two examples:
Since we are dealing with a column store you should always select only the required columns to minimize the I/Os.
Also, whenever possible, restrict your queries with a WHERE condition to minimize the scans.

Buy now at Amazon
  • Joe Harris says:

    The advice in this post is substantially out of date.
    1. Use AutoWLM with query priorities
    2. Compression is applied automatically
    3. {correct!}
    4. Use DISTKEY if possible, use DISTSTYLE AUTO otherwise
    5. Redshift automatically runs ANALYZE in the background
    6. Redshift automatically runs VACUUM DELETE.
    7. {correct!}

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like