Longer texts, such as descriptions of other columns, are stored in VARCHAR() columns. The maximum character capacity for storage is 64000 for the LATIN character set and 32000 for the UNICODE character set. When considering the impact of longer texts on query performance, it is important to note that the number of rows per data block decreases with longer text inputs, resulting in more IO-intensive full table scans.
The Teradata CLOB Datatype
CLOBs are a superior option for storing long strings as they are stored in a distinct sub-table. By separating CLOBs from the main table, queries accessing the table with a full table scan can be significantly enhanced. This is because Teradata reads fewer data blocks when the CLOB column is not selected, resulting in fewer IOs.
Below is an example table with detailed descriptions in each row’s “Desc” column.
CREATE TABLE Varchar_Table
(
ID INTEGER NOT NULL,
COl1 CHAR(01),
Desc VARCHAR(30000)
) PRIMARY INDEX (ID);
The absence of a secondary index requires copying all data blocks from the disk to the FSG cache, regardless of our interest in only the “ID” column content. Given that all rows contain extensive textual information in the “Desc” column, this results in a substantial amount of IOs.
SELECT ID FROM Varchar_Table WHERE COl1 = 'X' ;
Changing the data type to CLOB from VARCHAR() can significantly reduce resource consumption by allowing more rows to be stored per data block and requiring less disk transfer.
CREATE TABLE CLOB_Table
(
ID INTEGER NOT NULL,
COl1 CHAR(01),
Desc CHARACTER LARGE OBJECT
) PRIMARY INDEX (ID);
SELECT ID FROM Varchar_Table WHERE COl1 = 'X'; -- fewer blocks have to be moved from disk to FSG cache
Selecting a CLOB column does not confer any performance advantage. Instead, the base table and the sub-table containing the CLOB data must be read.
SELECT ID,Desc FROM Varchar_Table WHERE COl1 = 'X' ;
I feel like I’m missing something. The text explaining they benefit of the CLOB is very clear and understandable. But none of your examples reference the CLOB table. And you talk about joining to the CLOB table but no example is provided, and was what I was looking for!!!
The CBLOB sub-table is used automatically by Teradata. It is maintained by the system and not visible to us. Whenever your SELECT column list contains the CLOB column, it is taken from the sub-table, otherwise, only the base table has to be accessed.
I hope this made it clearer
So, in this fairly specific scenario, you get ‘columnar’ by the back door. I like it! I useful little trick, will have to see how it works in practice but this is definitely worth considering. One possible issue that comes to mind is the handling of CLOBs. They are not straight forward, but if the tool you’re using to query the data is ok with them then this should be fine.
Even without that issue, this is a nice idea.