Database systems aim to minimize IOs.
Teradata optimizes performance by establishing an effective primary index and utilizing secondary indexes. However, secondary indexes increase storage requirements and maintenance overhead for DML statements. Join indexes are essentially materialized views and are not applicable in this context.
The Teradata approach
Teradata’s Row Partitioning reduces IOs by organizing physical rows in the file system, preventing full table scans, and enabling the reading of necessary partitions without an index structure.
When creating Teradata row partitioned tables, defining a suitable partition key that aligns with the WHERE conditions or joins is crucial. Row partitioning is a vital aspect of creating the physical data model.
Snowflake takes a different approach
Snowflake tables are divided into micro-partitions that contain important metadata such as column value minimums and maximums. Snowflake’s column store design sets it apart from Teradata, which, despite providing a columnar storage format, is not considered a columnar database. During data processing, Teradata’s SQL engine must reconstruct rows in the execution plan.
Snowflake’s partition elimination feature doesn’t necessitate a pre-established partition key. The metadata of each micro-partition contains the data required for partition elimination, like the minimum and maximum values for each column. As a result, partition elimination can be implemented automatically for every column in each micro-partition.
Compared to Teradata Row Partitioning, this offers an advantage as the WHERE condition in a query must only reach the partitions to eliminate them.
Partition elimination in a Teradata table requires a WHERE condition on the date column.
Querying data in Snowflake is effortless, as only micro-partitions containing the requested data from the WHERE condition are scanned.
Snowflake can automatically narrow down the search to specific micro-partitions by adding a column filter to the WHERE clause. This functionality is unavailable in Teradata, which only partitions based on the date column. To achieve similar results to Snowflake, the Teradata table must be re-partitioned to include a multi-level partition key that incorporates both the date column and the additional filter column used in the WHERE clause.
Snowflake has a limitation
In Teradata, the partition key definition ensures that the rows specified in the WHERE condition are sequential in the file system. This allows for optimal copying and optimization from mass storage to main memory through cylinder reads. However, in Snowflake, partition efficiency relies on the initial data load. Loading data in a daily batch, one day after another can produce good clustering and partition elimination in Snowflake. In cases where this is not done, partition elimination efficiency in Snowflake may decline.
It’s advisable to use the clustering feature on large Snowflake tables, which organizes data in micro-partitions based on a chosen cluster key. This process is automated and runs in the background. However, implementing clustering incurs operational expenses, so minimizing DML statement usage is recommended when working with a large Snowflake table with a cluster key.