How Advanced Is The Teradata Columnar Solution?
A Column Store is defined by three distinct properties:
- Improved compression compared to Row Stores.
This results from considerably better compression rates that we can achieve by packing columns into data blocks. Since the values of a column come from a domain, the entropy is much lower.
- Columnar projection
Only the columns that are required are read.
- Columnar procession
This is only reserved for actual Column Stores. Teradata forms rows from the columns and then operates on them using the traditional row-based engine. It is, therefore, not a pure column store.
Why Is The Teradata Solution Great Even Though It Does Not Offer Columnar Processing?
Assuming a standard query, a date range selection is frequently made with a WHERE condition imposing a date restriction.
However, only a portion of the table’s current rows are frequently necessary.
Teradata offers simultaneous column and row partitions.
Teradata uses partitions to define subsets of rows and columns. A pure column store stores data per column in a partition. However, Teradata permits multiple columns to be designated as a column partition and partitioned by rows simultaneously.
The Advantages Of Teradata Columnar
Teradata provides two options for storing columnar data. The first option is the COLUMN format, which offers numerous benefits of a pure column store:
- Columns can be considered arrays with all the advantages, such as direct access to elements via the Index, simultaneous access to multiple array elements via Single Instruction Multiple Data (SIMD) operations, etc.
- Significant compression due to low entropy of column values (Dictionary Compression, Run-Length Encoding)
But saving in COLUMN format also has disadvantages since Teradata must search for the desired Index for a specific row in the column.
Therefore Teradata offers the ROW format as an alternative. Here the ROWID is stored with each column value. This allows Teradata to access the value directly as usual.
ROW Or COLUMN Format?
The following rules apply:
When selecting multiple columns, the ROW format is preferable, and the cost of storing the ROWID for each column value is insignificant.
Selecting a few columns typically warrants using the COLUMN format due to its superior compression rate and ability to perform SIMD processing.
Primary Index Or Primary AMP Index Access
Teradata’s hybrid solution offers the added benefit of accessing data through both the Primary Index and Primary AMP Index. The former enables direct row access through hashing, while the latter restricts processing to only the necessary AMPs.
Should I Use From Now On Only Partitioned Column Tables?
Column partitioning is not recommended for DML statements due to its lower performance than Row-Partitioned Tables. However, it is well-suited for enhancing reading access.
What Is The Biggest Disadvantage Of The Teradata Solution For Column Partitioning?
Creating tables presents numerous possibilities, which can be enticing. However, it necessitates a considerable amount of expertise to optimize its performance. Common inquiries include:
- Row partitioning, column partitioning, or both?
- Column partitioning using ROW or COLUMN format?
- Maybe indexing with USI, NUSI, or Join Index is better?
- Column partitioning with NOPI, Primary Index, or Primary AMP Index?
Optimizing a Teradata system can greatly enhance its performance, though it necessitates considerable expertise due to its complexity.
Compared to a pure column store like Snowflake, optimization options include creating a virtual warehouse cluster for parallel query execution or scaling up an existing EC2 instance to improve query speed. It is necessary to consider a more appropriate database system in such cases carefully.
Modern cloud databases prioritize elasticity over optimization. Therefore, the need for optimization is eliminated and one must rent additional computing power when necessary.
Ultimately, you exchange capital expenditures for operational ones by considering the total cost of ownership.
Be Careful With The Cloud
My primary concern regarding cloud databases is the uncertainty surrounding the longevity of their pricing models, such as billing by the minute. The evolution of mobile tariffs and the resourcefulness of operators serve as cautionary examples.
Check out this excellent piece on Teradata’s columnar technology.