From my experience, I can tell that many data warehouse implementations fail because of the wrong design choice at the beginning of the project.
I am a passionate advocate of the normalized data warehouse. Unfortunately, I have to admit that keeping a normalized core data model is not always a simple task. Bad performance may force us to denormalize (and this is usually the only reason for me to consider denormalization)
In my opinion, it is always a good idea to start with a normalized core data model and deviate from this approach only later in the modeling process for a reason mentioned above. Denormalization should always take place based on sound design, and should never be kind of an intuitive approach.
In the following section, we will discuss some common denormalization techniques.
While you would avoid repeating groups in a normalized data model (as they violate 1NF), they can help save space and avoid joins. Assume you are designing a table used to compare your customers’ end-of-month account balances, i.e., each month the balance has to be compared with the previous month’s balance.
Of course, you could store (the preferred method) the information in a normalized table, keeping one row per month and customer.
As you can imagine, to bring the required pair of rows together, you either need to do a self-join or have to use an ordered analytic function. Bad performance might force you to introduce repeating groups. You would store precisely one row per customer holding the balances from each month in a different column.
Pre-Join of Tables
Pre-Joining of tables can be achieved in two different ways.
While you could do manual pre-join in your ETL/ELT process and store the result in a table, Teradata offers another method, the Join Indexes.
Both kinds of pre-joins come with their advantages and disadvantages. The problem with Join Indexes is that it often requires a lot of time and patience to convince the Teradata optimizer to use the join index; Still, as Join Indexes are system maintained, they might be an excellent choice to support your applications (“set and forget”).
I prefer to design manual pre-joins for my ETL/ELT tasks (as the manual joins to ensure a stable performance) and to use Join Indexes to support the reporting tools (such as Microstrategy, Business Objects, etc.).
Another common denormalization approach is to store pre-calculated data.
You would choose this approach if the calculation of the required information is expensive and used often. For example, you may store the average account balance calculated over the last 12 months together. This would get rid of an expensive ordered analytic function otherwise needed to calculate this information.
While the above techniques can help solve performance issues, they never should be your first instrument of choice. Teradata offers many great tuning tools, such as row partitioning, secondary indexes, join indexes, etc. These tools should be used primarily to get rid of performance problems. Only if they can’t deliver the required result, classical denormalization should be considered.
You should never denormalize your data, just because it is the most straightforward approach and convenient. The associated problems, such as data redundancy and the risk of inconsistencies, can cause enormous costs in a later stage of the project – and usually, this will happen.
Many of the performance issues I had to solve during the last couple of years were caused by “intuitive denormalization” (i.e., denormalization as the “path of least resistance”).
I don’t judge about which approach is the preferred one when it comes to Inmon or Kimball.
I love normalized data models as a starting point, as in my experience, they help avoid a lot of errors that can be done in the early design phase of a data warehouse.