The golden rules for Amazon Redshift optimization
- Customize Workload Management
Workload management in Redshift means optimizing the queues. This includes the correct selection of queues, the assignment of queries and the allocation of memory.
Although automated workload management and concurrency scaling have greatly improved workload management, workload management is a critical point in ensuring a scalable system (and one of the weak points of Redshift)
Since Redshift Workload Management is primarily based on queuing queries, very unstable runtimes can be expected if configured incorrectly. Consider switching from manual WLM to automatic WLM, in which queues and their queries can be prioritized.
Consider whether concurrency scaling (for read-only workload) or short query acceleration makes sense in your environment.
- Use Column Encoding for compression
Consider compressing columns. This reduces the storage space required and also reduces the I/Os required by queries.
- Do not use Redshift for OLTP queries
Redshift is completely unsuitable for this kind of workload. Not only because it is a column store, but also because there are no indexes that can be used (as in Teradata columnar).
- Choose the right method to distribute the data
Use the DISTKEY to help Amazon Redshift join large tables.
Use EVEN if no suitable DISTKEY exists to avoid skew.
Copy lookups to all slices.
- Maintenance of your Amazon Redshift statistics
Only if the statistics are correct will memory be reserved in the correct size for the query plan created.
If too much memory is reserved, the other queries in the same queue are missing and are delayed.
If too little memory is reserved, it is possible that the memory must be buffered. The query will then be much slower.
Execute the ANALYZE command regularly.
- Release space regularly
As Rows' space in Amazon Redshift is not automatically freed after a DELETE or UPDATE this must be done manually. This is done with the VACUUM FULL command. Space is released and the rows are sorted by SORTKEY. This is regularly necessary to maintain performance
- Write better queries
Here are two examples:
Since we are dealing with a column store you should always select only the required columns to minimize the I/Os.
Also, whenever possible, restrict your queries with a WHERE condition to minimize the scans.