The Teradata TO_CHAR Function

Utilize the Teradata TO_CHAR function to:

  • Convert numeric expressions to character strings
  • Convert dates, times, interval data types, or timestamps to character strings

This is an example of a numeric expression argument, which can be any of the following data types: BYTEINT, SMALLINT, INTEGER, BIGINT, DECIMAL/NUMERIC, FLOAT/REAL/DOUBLE PRECISION, or NUMBER.

The format string, which must be of type CHAR or VARCHAR, represents the second argument (Further information on format strings can be found in the Teradata documentation: link).

SELECT TO_CHAR(123456789,'999999999');

If the query returns only ‘#’ characters, it indicates a conversion failure.

SELECT TO_CHAR(123456789,'99');
--> ###

An example of a date expression argument is provided below. The first argument must be a valid data type, including DATE, TIME, TIME WITH TIME ZONE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, or an Interval data type. The second argument must be either CHAR or VARCHAR.

SELECT TO_CHAR(DATE'2019-11-09', 'DD-MON-YYYY' );
--> '09-NOV-2019'

Teradata TO_CHAR and Statistic Estimations

Teradata’s frequent release of new features is exciting, as it simplifies our tasks. Nevertheless, it is crucial to remember that functions may lack statistics, posing a risk to the optimizer.

This example illustrates such a case. Initially, we will utilize TO_CHAR on a date and inspect the execution plan.

SELECT Col_1, Col_2 FROM 
DBNAME.TableNane 
WHERE TO_CHAR(Date_Col, 'mm-dd-yyyy') ='12-31-9999'
ORDER by 1;

COLLECT STATS COLUMN TO_CHAR(ENT_END_TS, 'mm-dd-yyyy') AS COL_DATE_UDF 
ON DBNAME.Tablename;

The execution plan shows that in step 2, the optimizer lacks confidence despite having collected statistics on the WHERE condition column. The estimated number of rows is incorrect at 1, whereas the result is two.

Step 2:

Next we do an all-amp operations RETRIEVE step from dbname.tablename in view viewed.view_nm by way of an all row scan with a condition of "(TD_SYSFNLIB.To_Char(CAST((date_col in view viewing.viewnam.ENT_END_TD) as TIMESTAMP(6) WITH TIME ZONE) AT TIME ZONE INTERVAL '0:00' HOUR TO MINUTE, 'mm-dd-yyyy')) ='12-31-9999'") into Spool 1 (group_amps). which is built locally on the amps. Then we do a SORT to order Spool 1 by the sort key in spool field1. The Size of spool 1 is estimated with no confidence to be 1 row.

Rewrite the query using a CAST function.

SELECT Col_1, Col_2 FROM
DBNAME.TableNane WHERE CAST(CAST(ENT_END_TD as DATE FORMAT 'MM-DD-YYYY') as CHAR(10)) ='12-31-9999' ORDER by 1;

COLLECT STATS COLUMN CAST(CAST(ENT_END_TD as DATE FORMAT 'MM-DD-YYYY') as CHAR(10)) as COL_DATE_UDF on DBNAME.Tablename;
Next we do an all-amp operations RETRIEVE step from dbname.tablename in view viewed.view_nm by way of an all row scan with a condition of "(TD_SYSFNLIB.To_Char(CAST((date_col in view viewing.viewnam.ENT_END_TD) as TIMESTAMP(6) WITH TIME ZONE) AT TIME ZONE INTERVAL '0:00' HOUR TO MINUTE, 'mm-dd-yyyy')) ='12-31-9999'") into Spool 1 (group_amps). which is built locally on the amps. Then we do a SORT to order Spool 1 by the sort key in spool field1. The Size of spool 1 is estimated with High Confidence to be 2 rows.

The optimizer rates this step with “high confidence” as it possesses the necessary statistics. However, using the TO_CHAR function in our initial query renders the statistics unusable.

Conclusion

If possible, avoid using User-Defined Functions (UDFs) like the Teradata TO_CHAR function when simpler methods are available. Functions often prevent using existing statistics, except in cases like UPPERCASE. Consequently, the optimizer must rely on heuristics, leading to suboptimal execution plans.

sp3006

  • Avatar
    Dieter Nöth says:

    This is a fine example of why end-users should be trained to write good SQL 🙂

    Instead of an explicit CAST/TO_CHAR one could simply use
    WHERE ENT_END_TD = DATE ’9999-12-31′

    This automatically casts the Timestamp to a Date.

    And changing the WHERE into a sargable condition would be even better:

    WHERE ENT_END_TD BETWEEN CAST(DATE ‘9999-12-31’ AS TIMESTAMP)
    AND TIMESTAMP ‘9999-12-31 23:59:59.999999’

    If it wasn’t for 9999-12-31 I would prefer:

    WHERE ENT_END_TD >= CAST(DATE ‘2016-01-08’ AS TIMESTAMP)
    AND ENT_END_TD < CAST(DATE '2016-01-08' +1 AS TIMESTAMP)

    • Thank you, Dieter, agree with you, thank you again.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >