The golden rules for Amazon Redshift optimization
Customize Workload Management
Workload management in Redshift means optimizing the queues. This includes the correct selection of queues, the assignment of queries, and the allocation of memory.
Although automated workload management and concurrency scaling have greatly improved workload management, workload management is a critical point in ensuring a scalable system (and one of the weak points of Redshift)
Since Redshift Workload Management is primarily based on queuing queries, very unstable runtimes can be expected if misconfigured. Consider switching from manual WLM to automatic WLM, in which queues and their queries can be prioritized.
Consider whether concurrency scaling (for read-only workload) or short query acceleration makes sense in your environment.
Use Column Encoding for compression
Consider compressing columns. This reduces the storage space and I/Os needed by queries.
Do not use Redshift for OLTP queries
Redshift is entirely unsuitable for this kind of workload. Not only because it is a column store but also because no indexes can be used (as in Teradata columnar).
Choose the proper method to distribute the data.
Use the DISTKEY to help Amazon Redshift join large tables.
Use EVEN if no suitable DISTKEY exists to avoid skew.
Copy lookups to all slices.
Maintenance of your Amazon Redshift statistics
Only if the statistics are correct will memory be reserved for the query plan created in the correct size.
If too much memory is reserved, the other queries in the same queue are missing and are delayed.
If too little memory is reserved, it is possible that the memory must be buffered. The query will then be much slower.
Execute the ANALYZE command regularly.
Release space regularly
As Rows’ space in Amazon Redshift is not automatically freed after a DELETE or UPDATE this must be done manually. This is done with the VACUUM FULL command. Space is released, and the rows are sorted by SORTKEY. This is regularly necessary to maintain performance.
Write better queries
Here are two examples:
Since we are dealing with a column store, you should always select only the required columns to minimize the I/Os.
Also, restrict your queries with a WHERE condition whenever possible to minimize the scans.