Improving Row Estimates With Teradata Check Constraints

Table of Contents

Without Teradata CHECK Constraint

When a non-indexed column is utilized in a retrieval operation (i.e. WHERE statement) and lacks Teradata Check Constraints, the Optimizer employs heuristics to approximate the number of rows in the resulting set by inferring missing statistics.

The estimated number of rows is 7,342, equivalent to 10% of the table’s total rows. This heuristic is utilized by the Teradata optimizer when statistics on the non-indexed column are not available.

CREATE TABLE Heuristics_Without_Check
(
	PK INTEGER NOT NULL,
	GENDER BYTEINT NOT NULL
) PRIMARY INDEX (PK);

INSERT INTO Heuristics_Without_Check
SELECT
	ROW_NUMBER() OVER (ORDER BY 1) AS PK,
	RANDOM(0,1)
FROM SYS_CALENDAR.CALENDAR;
-> 73,414 Rows inserted

--> Collect PI Statistics to have the real row number:

COLLECT STATISTICS COLUMN(PK) ON Heuristics_Without_Check;

SELECT * FROM Heuristics_Without_Check WHERE GENDER = 0;

Explain SELECT * FROM Heuristics_Without_Check WHERE GENDER = 0;

  1) First, we lock DWHPRO.Heuristics_Without_Check in TD_MAP1 for read
     on a reserved RowHash to prevent global deadlock.
  2) Next, we lock DWHPRO.Heuristics_Without_Check in TD_MAP1 for read.
  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from
     DWHPRO.Heuristics_Without_Check by way of an all-rows scan with a
     condition of ("DWHPRO.Heuristics_Without_Check.GENDER = 0") into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with no confidence to be 7,342 rows (
     190,892 bytes).  The estimated time for this step is 0.03 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.03 seconds.

But with a simple trick, the estimates can be massively improved even if the non-indexed column statistics are missing. The Optimizer uses existing CHECK constraints to optimize the estimates.

With CHECK Constraint

The following formula is used for this:

Estimated Rows = Total Rows x (1 / Possible Unique Values)

Since the RANDOM function we use to load our test table only returns the values 0 and 1, we can add the following CHECK constraint to our test table:

CREATE TABLE Heuristics_With_Check
(
	PK INTEGER NOT NULL,
	GENDER BYTEINT NOT NULL CHECK (GENDER IN (0,1))
) PRIMARY INDEX (PK);

INSERT INTO Heuristics_With_Check
SELECT
	ROW_NUMBER() OVER (ORDER BY 1) AS PK,
	RANDOM(0,1)
FROM SYS_CALENDAR.CALENDAR;
-> 73,414 Rows inserted

--> Collect PI Statistics to have the real row number:

COLLECT STATISTICS COLUMN(PK) ON Heuristics_With_Check;

Applying the formula above produces the following estimated row count for the result set.

Estimated rows = 73,414 * (1 / 2) = 36,707 rows

Examining the Execution Plan:

SELECT * FROM Heuristics_With_Check WHERE GENDER = 0;

Explain SELECT * FROM Heuristics_With_Check WHERE GENDER = 0;

  1) First, we lock DWHPRO.Heuristics_With_Check in TD_MAP1 for read on
     a reserved RowHash to prevent global deadlock.
  2) Next, we lock DWHPRO.Heuristics_With_Check in TD_MAP1 for read.
  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from
     DWHPRO.Heuristics_With_Check by way of an all-rows scan with a
     condition of ("DWHPRO.Heuristics_With_Check.GENDER = 0") into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with low confidence to be 36,707 rows
     (954,382 bytes).  The estimated time for this step is 0.07 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.07 seconds.

We execute the query on the table with the CHECK constraint and obtain a more accurate estimate of 36,707 rows. This estimate is vastly superior to the 10% heuristic. The number of rows closely aligns with the estimation because the RANDOM function generates a roughly equal number of zeros and ones. Adopting this approach can considerably enhance estimate precision.


SELECT COUNT(*) FROM Heuristics_With_Check WHERE GENDER = 0;
--> 36,711 rows

A Warning when using CHECK constraints

Do not define CHECK constraints solely to improve estimates. Keep in mind that validating the constraints requires extra processing and may trigger the creation of secondary indexes.

More information about Teradata CHECK constraints you can find here:
Check Constraints explained in Detail

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

Improving Row Estimates with Teradata Check Constraints

Without Teradata CHECK Constraint

With CHECK Constraint

A Warning when using CHECK constraints

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

Leave a Comment Cancel reply