We store longer texts such as descriptions of other columns in columns of the VARCHAR() data type. The maximum number of characters that can be stored is 64000 for character set LATIN and 32000 for character set UNICODE. If we consider long texts and their effect on the performance of queries, we must take into account that the fewer rows can be stored in a data block, the longer the texts are. Full table scans become IO more intensive with longer texts.
The Teradata CLOB Datatype
Character large objects (CLOB) can be an excellent alternative for storing long strings. The data of a CLOB column is stored as a separate sub-table. Separating CLOBs from the base table can significantly improve performance for queries accessing the table with a full table scan. Teradata reads fewer data blocks if the CLOB column is not selected. Fewer data blocks mean fewer IOs.
Here is an example table. We are assuming that the column “Desc” of each table row contains lengthy textual information:
CREATE TABLE Varchar_Table
(
ID INTEGER NOT NULL,
COl1 CHAR(01),
Desc VARCHAR(30000)
) PRIMARY INDEX (ID);
Without a secondary index, all data blocks must be copied from the disk to the FSG cache, even if we are only interested in the column “ID” content. As we are assuming that all rows contain lengthy textual information in column Desc this will cause a lot of IOs:
SELECT ID FROM Varchar_Table WHERE COl1 = 'X' ;
By changing the data type from VARCHAR() to CLOB, resource consumption can be decreased significantly; many more rows can be packed into each data block, and less disk transfer is required:
CREATE TABLE CLOB_Table
(
ID INTEGER NOT NULL,
COl1 CHAR(01),
Desc CHARACTER LARGE OBJECT
) PRIMARY INDEX (ID);
SELECT ID FROM Varchar_Table WHERE COl1 = 'X'; -- fewer blocks have to be moved from disk to FSG cache
No performance advantage is given if the CLOB column is part of the select list. In this case, the base table plus the sub-table holding the CLOB data have to be read:
SELECT ID,Desc FROM Varchar_Table WHERE COl1 = 'X' ;
I feel like I’m missing something. The text explaining they benefit of the CLOB is very clear and understandable. But none of your examples reference the CLOB table. And you talk about joining to the CLOB table but no example is provided, and was what I was looking for!!!
The CBLOB sub-table is used automatically by Teradata. It is maintained by the system and not visible to us. Whenever your SELECT column list contains the CLOB column, it is taken from the sub-table, otherwise, only the base table has to be accessed.
I hope this made it clearer
So, in this fairly specific scenario, you get ‘columnar’ by the back door. I like it! I useful little trick, will have to see how it works in practice but this is definitely worth considering. One possible issue that comes to mind is the handling of CLOBs. They are not straight forward, but if the tool you’re using to query the data is ok with them then this should be fine.
Even without that issue, this is a nice idea.