June 12


Why you should use the Teradata CLOB instead of VARCHAR()

By Roland Wenzlofsky

June 12, 2015

character large object, CLOB, data type, data types, sql

Wide textual information often is stored in columns of data type VARCHAR(). This data type can hold up to 32,000 (UNICODE) or 64,000 (LATIN) characters.
The impact of storing comprehensive text is that only a small number of rows fits into each data block.

The effect of storing large text is that only a relatively limited number of rows fits into each data block: In this case, full table scans are more expensive(more data blocks are required to store the whole set of data).

Luckily, there is another possibility to store extensive textual information:

The Teradata CLOB

Character large objects (data type CLOB) can be a great alternative for storing comprehensive textual information. The data of a CLOB column is stored as a separate sub-table. Separating them from the base table can give a significant performance advantage for queries which are accessing the table with a full table scan:  More rows can fit into each data block.

As a consequence, fewer data blocks have to be moved from disk to FSG cache, vastly reducing the number of disk IOs.

Here is an example table, we are assuming that column “Desc” of each table row contains extensive textual information:

CREATE TABLE Varchar_Table
COl1 CHAR(01),
Desc VARCHAR(30000)

In the absence of a secondary index, all data blocks have to be moved from disk to FSG cache, even if we are only interested in the content of column ID. As we are a assuming that all rows contain extensive textual information in column Desc this will be expensive from a resource consumption point of view:

SELECT ID FROM Varchar_Table WHERE COl1 = ‘X'  ;

By changing the data type from VARCHAR() to CLOB, resource consumption (disk IOs) can be decreased significantly; much more rows are packed into each data block. Therefore fewer blocks have to be moved from the drive into the FSG cache:

COl1 CHAR(01),

SELECT ID FROM Varchar_Table WHERE COl1 = ‘X';  — fewer blocks have to be moved from disk to FSG cache

No performance advantage is measurable if the CLOB column is part of the select list. In this case, the base table plus the sub-table holding the CLOB data have to be accessed:

SELECT ID,Desc FROM Varchar_Table WHERE COl1 = ‘X'  ;

Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like

  • I feel like I’m missing something. The text explaining they benefit of the CLOB is very clear and understandable. But none of your examples reference the CLOB table. And you talk about joining to the CLOB table but no example is provided, and was what I was looking for!!!

    • The CBLOB sub-table is used automatically by Teradata. It is maintained by the system and not visible to us. Whenever your SELECT column list contains the CLOB column, it is taken from the sub-table, otherwise, only the base table has to be accessed.

      I hope this made it clearer

  • So, in this fairly specific scenario, you get ‘columnar’ by the back door. I like it! I useful little trick, will have to see how it works in practice but this is definitely worth considering. One possible issue that comes to mind is the handling of CLOBs. They are not straight forward, but if the tool you’re using to query the data is ok with them then this should be fine.
    Even without that issue, this is a nice idea.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    Never miss a good story!

     Subscribe to our newsletter to keep up with the latest trends!