Guest PostWhile observing one of the ADHOC workloads where the user was making use of TO_CHAR function available in TD 14.10. Teradata version the explain plan generated by the user SELECT was showing the “No Confidence.” The column used was TIMESTAMP column and utilized in the WHERE clause using TO_CHAR(Date_Col, ‘mm-dd-yyyy’)

Below is the simple query executed by the user:

SELECT Col_1, Col_2 FROM

DBNAME.TableNane WHERE TO_CHAR(Date_Col, ‘mm-dd-yyyy’) =’12-31-9999′

ORDER by 1

COLLECT STATS COLUMN TO_CHAR(ENT_END_TS, ‘mm-dd-yyyy’) as COL_DATE_UDF on DBNAME.Tablename;

Explain:

The below illustrate illustrates that in step 2 below the optimizer has selected “No Confidence” for the step despite having the stats on the columns using expressions and the rows estimated showing 1 row which is incorrect information. See the snapshot from explaining step2

Step 2:

Next we do an all-amp operations RETRIEVE step from dbname.tablename in view viewed.view_nm by way of an all row scan with a condition of “(TD_SYSFNLIB.To_Char(CAST((date_col in view viewing.viewnam.ENT_END_TD) as TIMESTAMP(6) WITH TIME ZONE) AT TIME ZONE INTERVAL ‘0:00’ HOUR TO MINUTE, ‘mm-dd-yyyy’)) =’12-31-9999′”) into Spool 1 (group_amps). which is built locally on the amps. Then we do a SORT to order Spool 1 by the sort key in spool field1. The Size of spool 1 is estimated with NO Confidence to be row 1

Now the above query has been re-written as below

SELECT Col_1, Col_2 FROM

DBNAME.TableNane WHERE CAST(CAST(ENT_END_TD as DATE FORMAT ‘MM-DD-YYYY’) as CHAR(10)) =’12-31-9999′ ORDER by 1;

COLLECT STATS COLUMN CAST(CAST(ENT_END_TD as DATE FORMAT ‘MM-DD-YYYY’) as CHAR(10)) as COL_DATE_UDF on DBNAME.Tablename;

Next we do an all-amp operations RETRIEVE step from dbname.tablename in view viewed.view_nm by way of an all row scan with a condition of “(TD_SYSFNLIB.To_Char(CAST((date_col in view viewing.viewnam.ENT_END_TD) as TIMESTAMP(6) WITH TIME ZONE) AT TIME ZONE INTERVAL ‘0:00’ HOUR TO MINUTE, ‘mm-dd-yyyy’)) =’12-31-9999′”) into Spool 1 (group_amps). which is built locally on the amps. Then we do a SORT to order Spool 1 by the sort key in spool field1. The Size of spool 1 is estimated with High Confidence to be row 2.

If you observe the above using the CAST on Timestamp column instead of using the TO_CHAR will help the optimizer to estimate the correct amount of rows and the High Confidence for a step which was predicted as No Confidence despite having the stats available on the column in the form of expression.

Conclusion: UDF should be used in a fairly smaller amount of data, using Teradata CAST vs. the UDF TO_CHAR is more reliable as it is helping the optimizer to estimate correct amount of rows.

 

Our Reader Score
[Total: 6    Average: 3.3/5]
Teradata TO_CHAR and CAST on Date Value written by sp3006 average rating 3.3/5 - 6 user ratings

3 COMMENTS

  1. This is a fine example why end users should be trained to write good SQL 🙂

    Instead of an explicit CAST/TO_CHAR one could simply use
    WHERE ENT_END_TD = DATE ’9999-12-31′

    This automatically casts the Timestamp to a Date.

    And changing the WHERE into a sargable condition would be even better:

    WHERE ENT_END_TD BETWEEN CAST(DATE ‘9999-12-31’ AS TIMESTAMP)
    AND TIMESTAMP ‘9999-12-31 23:59:59.999999’

    If it wasn’t for 9999-12-31 I would prefer:

    WHERE ENT_END_TD >= CAST(DATE ‘2016-01-08’ AS TIMESTAMP)
    AND ENT_END_TD < CAST(DATE '2016-01-08' +1 AS TIMESTAMP)

LEAVE A REPLY

Please enter your comment!
Please enter your name here