Teradata Development – Keep it simple, stupid!

Specialization within the workforce has led to a growing issue, which I have observed with greater occurrence. Specifically, the number of professionals skilled in all aspects of data warehousing has declined significantly. A decade ago, teams responsible for data warehousing were composed primarily of individuals with extensive backgrounds in the information technology industry. However, this is no longer the norm, as companies have begun to create highly specialized roles. While this may be assumed to reduce costs, I question its effectiveness and believe it contributes to many new problems.

One issue that has surfaced in multiple recent projects is the creation of excessively large SQL statements.

Teradata data warehouse developers are commonly expected to possess a skill set primarily around SQL proficiency and familiarity with ETL tools.

A streamlined development process typically appears as follows:

Business Analyst -> Technical Analyst -> Developer

The technical analyst typically condenses the business logic into a mapping document that is subsequently provided to the developers.

Due to limited development skills, some developers tend to create a single, excessively large SQL statement for each mapping without considering the possibility of redundancy. I have observed instances where developers blindly replicate the same code repeatedly.

Although it may appear advantageous for maintenance purposes, this method fails to create efficient SQL, leading to erroneous and superfluous code, ultimately resulting in an increased maintenance burden.

The Teradata optimizer is responsible for generating the execution plan of a SQL query. While it may enhance with each new release of Teradata, it is important to consider how the optimizer decomposes each SQL statement into a sequence of 2-table joins.

While the join of 3 tables allows for 3*2*1 = 6 different possibilities of consecutive 2-table joins, we already have more than 3 million possible 2-table joins (10*9*8*7*6*5*4*3*2*1) in case we have to join ten tables together!

As you can imagine, the optimizer has a substantial risk of making mistakes.

The likelihood of obtaining a favorable execution plan diminishes rapidly as the number of joins increases.

How can we reduce the risk of a bad execution plan?

Reducing the risk of poor performance can be achieved through three simple and effective methods:

  • Keep the number of joins small (the obvious solution)
  • Split your SQL statements into parts by working with temporary tables (volatile tables, global temporary tables, or even real tables)
  • Identify common business logic and maintain it in one place within your code.

Arguing against this idea might seem reasonable, as it means sacrificing the efficiency of having a single SQL statement per mapping. However, implementing the aforementioned tips yields quick dividends. Failing to adhere to these guidelines could result in someone else taking over your poorly performing implementation later.

  • Avatar
    Sreeraj Kalappurakkal says:

    This is the very essence and secret of High Perferforming Teradata ETL!

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like