CLOB

Wide textual information often is stored in columns of data type VARCHAR(). This data type can hold up to 32,000 (UNICODE) or 64,000 (LATIN) characters.
The impact of storing comprehensive text is that only a small number of rows fits into each data block.

The effect of storing large text is that only a relatively limited number of rows fits into each data block: In this case, full table scans are more expensive(more data blocks are required to store the whole set of data).

Luckily, there is another possibility to store extensive textual information:

The Teradata CLOB

Character large objects (data type CLOB) can be a great alternative for storing comprehensive textual information. The data of a CLOB column is stored as a separate sub-table. Separating them from the base table can give a significant performance advantage for queries which are accessing the table with a full table scan:  More rows can fit into each data block.

As a consequence, fewer data blocks have to be moved from disk to FSG cache, vastly reducing the number of disk IOs.

Here is an example table, we are assuming that column “Desc” of each table row contains extensive textual information:

CREATE TABLE Varchar_Table
(
ID INTEGER NOT NULL,
COl1 CHAR(01),
Desc VARCHAR(30000)
) PRIMARY INDEX (ID);

In the absence of a secondary index, all data blocks have to be moved from disk to FSG cache, even if we are only interested in the content of column ID. As we are a assuming that all rows contain extensive textual information in column Desc this will be expensive from a resource consumption point of view:

SELECT ID FROM Varchar_Table WHERE COl1 = ‘X’  ;

By changing the data type from VARCHAR() to CLOB, resource consumption (disk IOs) can be decreased significantly; much more rows are packed into each data block. Therefore fewer blocks have to be moved from the drive into the FSG cache:

CREATE TABLE CLOB_Table
(
ID INTEGER NOT NULL,
COl1 CHAR(01),
Desc CHARACTER LARGE OBJECT
) PRIMARY INDEX (ID);

SELECT ID FROM Varchar_Table WHERE COl1 = ‘X’;  — fewer blocks have to be moved from disk to FSG cache

No performance advantage is measurable if the CLOB column is part of the select list. In this case, the base table plus the sub-table holding the CLOB data have to be accessed:

SELECT ID,Desc FROM Varchar_Table WHERE COl1 = ‘X’  ;

 

Our Reader Score
[Total: 9    Average: 4.1/5]
Why you should use the Teradata CLOB instead of VARCHAR() written by Roland Wenzlofsky on June 12, 2015 average rating 4.1/5 - 9 user ratings

1 COMMENT

  1. So, in this fairly specific scenario you get ‘columnar’ by the back door. I like it! I useful little trick, will have to see how it works in practise but this is definitely worth considering. One possible issue that comes to mind is the handling of CLOBs. They are not straight forward, but if the tool you’re using to query the data is ok with them then this should be fine.
    Even without that issue this is a nice idea.

LEAVE A REPLY

Please enter your comment!
Please enter your name here