PDM Archives - DWHPRO

Archive

Category Archives for "PDM"
2

9 Things to consider when creating a Teradata Table

Teradata Table Design Checklist

The creation of tables in an ever-recurring task, so I thought it was time to offer a checklist.

  • Primary Index or NOPI?
  • Is the Primary Index distributing rows evenly?
  • Is the correct character set used?
  • Are character set and data types matching across tables?
  • Can the Primary Index be unique?
  • Should the table be MULTISET?
  • Should secondary indices be available?
  • Can temporary tables be normalized?
  • Is the correct data block size used?

1. Primary Index or NOPI Table?

The Primary Index is used for various purposes. It usually is optimized for even row distribution and to be sued as the main access path to the data.

But there are scenarios where it's could be beneficial to create a table without a primary index (NOPI).

Staging is one example:  

By defining stage tables without a primary index, the load performance can be increased (for example, fastloads will not need to sort and distribute rows in its final phase of loading).

If NOPI tables help to improve the overall load performance, of course, it depends on the later steps in the ETL process and the impact of a missing primary index.

More details about the usage of NOPI tables you can find here:  NOPI Tables on Teradata

2. Is the Primary Index skewed?

The goal is to have a Primary Index which distributes the table rows evenly across all AMPs.  Consequently, it is necessary to know your data.

Don't rely on small amounts of test data as a base for your primary index decision.

Data demographics of production data can change significantly. I experienced this several times; I will give you one real-life example (my client was a European telco):

The main customer table's primary index was mapped 1:1 from the correspondent source system column, let's call it “customer_id”.

During the development phase of the project, several files with customer test data were delivered, supporting the chosen primary index. As rows were distributed evenly across all AMPs, load performance was excellent.

The whole physical data model was optimized to support call-center employees to look up the customer value in real-time during their calls (i.e. tactical workload). 

Furthermore, the client's table could be joined very efficient with other related tables, such as call details to support reporting.

Everything looked perfect until we entered the production phase. On the first day, the load times for the customer table increased from about 30 minutes to 6 hours with a subsequent “no more disk space” error.

What had happened?

During the test phase, all delivered files were related to “contract” customers. No one ever thought about testing “prepaid” customers. It turned out that all prepaid customers were assigned to a single “customer_id” in the source system…

The result was a massive redesign, a project delay of several months, and a complete loss of reputation.

In rare cases, you would like to “design” a skewed primary index, for example, if you want to force all rows into one data block of one AMP.

Doing this, you can achieve rowhash locking instead of table locks and have other advantages. For details, please read our article All Rows on one AMP by Design.

3. Which CHARACTER SET to use?

While it saves time to leave out such considerations, in the end, it never pays off. Of course, you could define each single character column as UNICODE.

But such an approach comes with two disadvantages: you will waste valuable disk space and decrease the performance of requests related to these tables.

Such a sloppy approach can lead to a disaster. Here one example from real life. It recently happened in a major European bank:

We had to deal with tables, each of them carrying several hundreds of columns. Load times (fast load) were horrible: 6 hours for a ridiculous 50GB of data!

When I was called to the project to analyze these performance issues, it turned out that every single column was defined with character set UNICODE.

Some simple calculations revealed that data rows wide: Only 1-2 rows were fitting into each data block.

A redesign of dozens of tables was required. By changing all columns to the character set LATIN, and only keeping a few UNICODE columns, the load times were reduced drastically as more rows could be packed into each data block.

This article shows you how significant the impact of wrongly chosen character sets can be: Teradata Tuning Success – The Best ever

4. Are Data Types and CHARACTER SETS matching across Tables?

Data type consistency is a crucial point but often overseen.

Data Type and Character Set Conversions during query executions are expensive operations (CPU time) and prevent AMP local activities (such as row-key based merge joins). Rows from different tables joined on their primary index are only co-located on the same AMP if data types match. The same holds true for character sets.

5. Can a Unique Primary Index be defined?

If possible, Always define your primary index as unique. It's a very sloppy way of developing if you skip uniqueness because it's more convenient (“no problem with duplicates”).

A unique primary index may not always be possible, for example in the case of row partitioned tables, if the partition column is not included in the primary index. In this case, you can achieve uniqueness by adding a USI (unique secondary index).

There are several (often not obvious) disadvantages if we skip uniqueness. Here are some examples:

  • NUPI tables drastically reduce the possibilities of the Teradata Optimizer to use the Nested Join strategy.
  • No duplicate row checks (see in point 6) are needed in the case of SET tables, as the UPI guarantees uniqueness.

6. Can MULTISET tables be used?

If you don't have to make sure that duplicate rows are declined, always use MULTISET tables.

To understand what duplicate row checks are, you need to know how Teradata distributes the data over the AMPs using hashing.

Each time a row is inserted into a table, the parsing engine determines a hash value and uses the hashmap to determine the AMP that should write the row to its logical disk.

Therefore a message consisting of TableId (to uniquely identify the table), RowHash (the hash value), and the content of the Primary Index Columns is sent to the AMP.

Why is the value of the primary index columns sent?

This is because two different Primary Index values can generate the same row hash. It's rare, but it happens.

This means that in order to sort a row into the data block at the right place, not only the hash value must be compared, but also the content of the primary index columns.

Since no two ident rows may be inserted into a SET table, each new row to be inserted must be compared against all existing values (“duplicate row check”).

The number of required comparisons increases exponentially and finally leads to one or more AMPs being overloaded in skewed insert statements.

If you cannot define a MULTISET table, then a duplicate row check can be prevented by creating a unique secondary index!

More details about SET and MULTISET tables you can find here: Teradata Multiset versus Set Tables

7. Should a Secondary Index be defined?

Delete secondary indexes and join indexes from tables before loading them with a large number of rows, even if the loading method does not require this (transactional load methods such as BTEQ, TPUMP).

It is usually better to remove the indexes, load the tables, and then create the indexes.

Here is an example of why this approach is advantageous

Once I had to solve the following problem:

A user had started an insert statement, which was not finished after 2 hours. Finally, this user became impatient and aborted the query.

This led to a rollback. I was notified the following day when the query was already in “ABORTED” status for another 10 hours.

I analyzed the situation:

  • The inserted primary index values were skewed and all rows ended up on the same AMP (mostly NULL values)
  • The table was a SET table; this was further worsening the situation because of “duplicate row checks” being done
  • There were 4 NUSIs defined on the table

I decided to cancel the rollback because this query had a massive impact on the performance of the Teradata system.

This is always an emergency solution because the table is then in an inconsistent state!

The combination of the SET table, 4 secondary indices, and the enormous skew had led to this catastrophic result.

The huge number of duplicate row checks performed by only one AMP was a major problem. But of course, the maintenance of 4 NUSIs additionally had an enormous negative effect on the performance of the system in this situation.

This problem could have been prevented with the above method

8. Can Columns be normalized on the fly with Temporary Tables?

If you are using temporary tables (volatile, global temporary or physical tables), you should at first consider improving all areas mentioned above (data types, character sets, primary index, etc.) to support your workload in the best possible way.

It is also a good practice to normalize columns in temporary tables. This prevents functions from having to be applied to these columns to obtain the required part of the information.

Functions applied to columns usually create a problem in joins as well as in WHERE conditions:

  • They often prevent the use of statistics on these columns.
  • They require a redistribution of the rows in joins.

Make them atomic. Splitting will help to improve the performance of joining, as you can avoid joining conditions such as “ON TABLE1.PK = SUBSTRING(TABLE2.PK,1,5).

9. Does the Data Block Size support your type of Workload?

Starting with Teradata 14 it is possible to create data blocks up to a size of 1MB.

Please note the following:

The larger the data blocks, the more data can be copied from the mass memory to the main memory in one read operation. However, more main memory is needed per data block.

In the worst case, Teradata reads a 1MB block of data to return a single row.

For tactical workload on a Teradata system, smaller data blocks are usually recommended, as individual rows are queried.

Larger data blocks are recommended for strategic workloads (which often include a full table scan and aggregations).

3

Improvements for skewed queries – The Teradata Space Limit Feature

With the Teradata Version 16.00, some interesting features in the area of Space Management have been introduced, which can bring enormous improvement in the daily activities in the area of administration and loading of the data warehouse.

In my current environment, it regularly happens that the daily loading of the data warehouse is interrupted because there is not enough permanent space available in the databases.

As we all know, the maximum available space is defined when a database is created. Here is the syntax to create a database with 80 GB perm space:

CREATE DATABASE Customer AS PERM = 800000000000;

The entire available space is then evenly allocated to all AMPs. For simplicity's sake, we assume 8 AMPs. In this case, each AMP has 10GB of permanent space available.

Limits for Skewed Tables

This even distribution of the available PermSpace to all AMPs is based on the assumption that all rows of all tables are distributed evenly using the primary index. But what happens if this is not the case?

Let's assume that we want to load a table with 11GB of data, and the designer of the table structure has chosen the Primary Index so clumsy that all rows are assigned to exactly one AMP. Since the space limit on each AMP is 10GB,
we will receive the following error message:

2644: No more room in database Customer.

This means that although the remaining 7 AMPs each have 10GB available, the table cannot be loaded.

The maximum available space in a database is shown in table DBC.DATABASESPACE in the MaxPermSpace column. The currently occupied space can be seen in Column CurrentPermSpace. In CurrentPermSpace, however, only the entire occupied space of the database is displayed, not the occupied space per AMP.

I've often seen surprised faces of developers who were surprised that when there was enough space (MaxPermSpace – CurrentPermSpace) the error message “2644: No more room in database Customer” appeared.

In order to reduce or completely prevent such problems, Teradata 16.00 offers the possibility to globally define the available space.

Therefore the new DBC Table GlobalDBSpace, as well as an extended syntax for creating a database was introduced:

CREATE DATABASE Customer AS PERM = 800000000000 SKEW = 10 PERCENT;

The 10 PERCENT Skew given when creating the database means the following:

The total PermSpace available is still 80GB. Nothing's gonna change that. However, each of the 8 available AMPs is allowed to hold up to 11GB of data if it does not exceed the absolute upper limit of 80GB of the database.

In our example from above, in which the clumsy selection of the primary index would insert all rows on exactly one AMP, the error “2644 No more room in database Customer” would no longer occur. However, such a situation is recorded in the Software Event Log.

While the feature described above simplifies the handling of skew tables, another feature was introduced with Teradata 16.00, which has the purpose to prevent loads from crashing if not enough database space has been defined:

Global Space Soft Limit.

This is defined via DBSControl GlobalSPaceSoftLimit and specified as a percentage.

In principle, a percentage is defined here above which the defined PermSpace may be exceeded without the error “2644 No more room in database Customer” occurring.

For example, if the database is defined on an 8 AMP system with 80GB PermSpace, and a soft limit of 10%, up to 11GB PermSpace can be used per AMP.

Again, an event is triggered and written to the Software Event Log. This allows database administrators to analyze the situation afterward and take appropriate action.

Unneccessary IO elimination – Teradata View Optimization

GUEST POST BY ARTEMIY KOZYR

Summary

Today I am going to show you how to identify the source problem to tune the performance of query utilizing the unnecessary amount of IO, what UII indicator is, and how WHERE clause placed on a VIEW is evaluated.

This case shows that the WHERE clause placed on VIEW for 2 Partitioned tables resulted in excessive block reads and applying WHERE condition afterward. You need to be fully aware of the amount of data query reads and the amount of data query really needs.

Performance tuning boosted the performance 5 fold. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources:

Teradata View Optimization

1+ Billion rows INSERTED

Here is the target query for today which populates large fact table of the data mart.

– INSERT INTO … SELECT FROM …

– The query is run on a daily basis

– Over 1000M rows inserted (1 billion)

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership  (         …        ) SELECT  … FROM    PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE   (crm_wave_launch_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 )))  /* Today – 1 month */     AND  (crm_campaign_enter_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today – 1 month */ ;

Poor performance detected

Query performance statistics stored in Teradata QueryLog signifies that the query is suspicious and performing badly in terms of unnecessary IO, excessive Spool Spool space and CPU utilization.

Take a look at Teradata performance metrics:

Teradata View Optimization

What knowledge can we discover from this data?

What is particularly wrong and how can we tune and improve the query?

A closer look at IO, source view DDL

First of all, let us describe what UII (Unnecessary IO indicator) is and what it can tell us.

UII is calculated as SumIO / (SumCPU * 1000). The metric is used to make an idea about query efficiency for consuming CPU and IO resources. If UII is relatively high then it could mean that many data blocks are read but a relatively small proportion of them is actually processed.

Let us examine source View DDL through which the target table is populated.

REPLACE VIEW SBX.V_CRM_FT_CAMPAIGN_MEMBERSHIP AS LOCKING ROW ACCESS SELECT     … FROM SBX.T_CRM_CAMPAIGN_MEMBER CCM     JOIN SBX.T_MD_CRM_CAMPAIGN  CP          ON  (             …             AND CP.WAVE_LAUNCH_DATE>=OADD_MONTHS(CURRENT_DATE,-6)             AND  CCM.CAMPAIGN_ENTER_DATE >=OADD_MONTHS(CURRENT_DATE,-6));

A-ha! Here we see that latest 6 month are extracted from the view.

The tables are partitioned by columns used in WHERE clause:

CREATE MULTISET TABLE SBX.T_CRM_CAMPAIGN_MEMBER ,FALLBACK ,      NO BEFORE JOURNAL,      NO AFTER JOURNAL,      CHECKSUM = DEFAULT,      DEFAULT MERGEBLOCKRATIO      (                …         ) PRIMARY INDEX ( MEMBER_ID , WAVE_ID ) PARTITION BY RANGE_N(CAMPAIGN_ENTER_DATE  BETWEEN DATE ‘2014-12-01' AND DATE ‘2025-12-31' EACH INTERVAL ‘1' MONTH );

But only last month’s data is inserted in the target table! The INSERT statement is as:

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership  (         …        ) SELECT  … FROM    PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE   (crm_wave_launch_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 )))  /* Today – 1 month */     AND  (crm_campaign_enter_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today – 1 month */ ;

So, the query simply reads 5 months data in vain with no particular reason. The thing is that before final query applies any WHERE clause, the whole data is read.

It means every data block under this VIEW is read first, and then non-relevant data is eliminated. But why read unnecessary data? Let us figure out how to force Teradata extract only relevant partitions.

Here is the answer:

1. We might put the WHERE clause from the final query INSIDE the source VIEW DDL.

2. We might create an entirely new VIEW for this ETL process in case if there are some other users who might not want any changes to the VIEW.

Improved query performance

This optimization gave us 5 times boost in performance. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources.

Detailed query metrics below:

Teradata View Optimization

 

Artemiy Kozyr is Data Engineer at Sberbank, Moscow with Master Degree in CS.
He has 5 years of Experience in Data Warehousing, ETL, Visualization for Financial Institutions.

Contacts:

[email protected]
http://linkedin.com/in/artemiykozyr/

The Costs of Decomposable Columns

What are Decomposable Columns?

Decomposable columns can be split into smaller units. Decomposable columns mean that the physical data model is not normalized.

If we don't use a decomposable column for data retrieval, there is no negative impact on performance.

Still, if the column is used for data access, it should be decomposed into several columns which can be utilized independently for indexed access. You will get increased performance because there will be fewer full table scans.

Additionally, if the columns created from a decomposable column are NUSI columns, the possibilities for NUSI bitmapping are increased:

Here is an example:

— Select all Private Gold Products being sold by our company

SELECT * FROM Transaction WHERE ProductCode LIKE ‘PRIVGOLD%';

— Select all Private Products being sold by our company

SELECT * FROM Transaction WHERE ProductCode LIKE ‘PRIV%';

In both cases typically a full table scan is required to access the data.

By decomposing the column ProductCode, we give the Optimizer a chance to use existing NUSIs:

CREATE INDEX COLUMN(ProductSegment) ON Transaction;
CREATE INDEX COLUMN(ProductLevel) ON Transaction;

SELECT * FROM Transaction WHERE ProductSegment = ‘PRIV' AND ProductLevel = ‘Gold';

Often we can't change the physical data model, and we have to find workarounds to avoid the negative performance impact of decomposable columns. Here is a simple but efficient solution:

SELECT p.ProductDescription,t.ProductCode
FROM
Transaction t
INNER JOIN
Product p
ON
t.ProductCode = p.ProductCode
WHERE
p.ProductCode LIKE ‘%Gold%'
;

Table Product is just a lookup containing all distinct ProductCodes. The Optimizer will probably do a full table scan on the Product Table with a subsequent merge join to the Transaction table. As the Product table is small, the cost of full table scan is negligible.

Still, the first choice is to avoid the implementation of decomposable columns whenever possible.

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.

8 Teradata Data Access Paths Explained

The Teradata Data Access Paths

Advantages
Disadvantages

Full Table Scan
Touches each row only once and provides access for any combination of columns in the WHERE condition predicate. A good choice if all rows need to be touched.
It reads each data block and usually requires a spool table as large as the base table

Unique Primary Index (UPI)
It is the cheapest access path and used when the SQL statement contains the primary index columns in the WHERE condition predicate. Only requires one AMP. No spool file is created. Allows to use row hash locking, the most granular locking method available
None

Nonunique Primary Index (NUPI)
Provides efficient access and involves just one AMP. It allows rowhash locking, may or may not require a spool file, depending on thenumber of rows returned.
The NUPI may have bad INSERT performance for a SET tables (duplicate row check).

Unique Secondary Index (USI)
Provides an efficient access path if the request WHERE condition predicates contain the USI columns. Typically involves only two AMPs. No spool table is created
Causes additional overhead for INSERT, UPDATE, MERGE, and DELETE statements

Nonunique Secondary Index(NUSI)
Provides efficient access when the number of distinct values is low, although it requires all AMPS and may create a spool table
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements. It will not be used by the Optimizer if the number of data blocks accessed is high, as it's an All-AMP acess and a full table scan often is cheaper

Single-Table Join Index (JI) and Hash Index
Ideal when restricting the index columns to often used columns as it can reduce the number of IOs. Can have a different PI than the base table
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements

Multitable Join Index (JI)
Eliminates the need to perform joins and aggregates, and may be able to satisfy a query without referencing the base table. It can have a different PI than the base table and can replace an NUSI or a USI
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements for any of the base tables being part of the Multitable Join Index. Bad choice for tables with a large number of changes.

Sparse join index (JI)
Limits the number of rows being stored compared with a normal join index.  This limits the overhead for INSERT, UPDATE, MERGE, and DELETE statements. If restricted to highly selective column values, it can be used by the Optimizer to use the join index to access less common values
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements

See also:
The Teradata Join Index Guide – We leave no Questions unanswered!

17

Data Modeling and the Teradata FSLDM

Overview

  • What is A Data Model?
  • What are the Major Types of Data Models?
  • What is FS-LDM?
  • How is FS-LDM Implemented?

Dimensional Modeling

  • Dimensional Modeling (DM) is a modeling Technique in data warehousing, visually represented as a fact table surrounded by dimension tables.
  • Some of the terms commonly used in this type of modeling:
    • Dimension: A category of information. For example, the time dimension.
    • Attribute: An individual level within a dimension. For example, Month is an attribute in the Time Dimension.
    • Hierarchy: The specification of levels that represents the relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.
    • Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure.
    • Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse.

SCHEMAS

  • In designing data models for data warehouses/data marts, the most commonly used schema types are Star Schema and Snowflake Schema.
  • Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) as a star. A star schema can be simple or complex. A simple star consists of one fact table; a sophisticated star can have more than one fact table.
  • Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increasing number of lookup tables.

Relational Data Model

  • The relational data model is entirely based on the predicate logic and set theory of mathematics.
  • The data is arranged in a relation which is visually represented in a two-dimensional The data is inserted into the table in the form of tuples (which are nothing but rows). A tuple is created by one or more than one attributes, which are used as basic building blocks in the formation of various expressions that are used to derive a meaningful information. There can be any number of tuples in the table, but all the tuple contain fixed and same attributes with varying values.
  • The relational model is implemented in database where a table represents a relation; a row represents a tuple, an attribute is represented by a column of the table

All the relational algebra operations, such as Select, Intersection, Product, Union, Difference, Project, Join, Division, Merge, etc. can also be performed on the Relational Database Model. Operations on the Relational Database Model are facilitated with the help of different conditional expressions, various key attributes, pre-defined constraints, etc. For example selection of information of the customer, who is living in some city for more than 20 years.

  • Following are few terms used in relational database model:
    Candidate Key: Any field or a combination of fields that identifies a record uniquely is called a Candidate Key. The Candidate Key cannot contain a NULL value and should always provide a unique value.
    Primary Key: Primary key is nothing but a candidate key that identifies a record uniquely.
    Foreign Key: A Foreign key is a primary key for another table, in which it uniquely identifies a record. A Foreign Key defines the relation between two (or more) tables. A Foreign key can contain a NULL

Constraints: Constraints are logic rules that are used to ensure data consistency or avoid certainly unacceptable operations on the data.

Relational Model: Summary

  • A tabular representation of data.
  • Straightforward and intuitive, currently the most widely used.
  • Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.
    • Two critical ICs: primary and foreign keys
    • Also, we always have domain constraints.
  • Powerful and natural query languages exist.

Introduction to the Teradata FSLDM

LOGICAL DATA MODEL

  • A logical data model is a graphical representation of how data is organized. It specifically defines which individual data elements are stored and how they relate to each other.
  • The data model ultimately defines which business questions can be answered and thus determines the business value of the entire strategic and tactical data warehouse environment.

Data Modeling Notations

Why Do a Logical Data Model

  • The Logical Data Model is a representation of the things of significance to an enterprise and the relationships between those things. A Logical Data Model reflects the rules of the business.
  • A business rule, in turn, indicates how a company does business and declares its operating principles and policies.
  • Two examples are: “An account may belong to more than one individual, or an individual can have one or more addresses.
  • The Logical Data Model is a way to show the rules of the business graphically. It can function as a means of Communication and as a data management tool.
  • Also: It serves as a roadmap for achieving data integration in an organization. It is a guide for development over the long term.
  • It shows interlocking parts. Understanding all of the interdependent parts makes it easier to expand the model for future enhancements.
  • It forms a foundation upon which to build applications or business views.
  • It allows you to recognize data redundancy and to control it. Data redundancy can lead to inaccurate and inconsistent reporting of business information.
  • It serves as a starting point for developing the physical model amid physical database design.
  • It aids communication between the analyst and the business user and between
  • As a rigorous Technique. It imposes discipline on the warehouse development process and leads to the development of stable. Robust. Long term and reliable solutions.
  • As a communication tool. It allows you to understand what you are building before you build it. Moreover, it allows you to see what you made after you built it.

What is the Teradata FSLDM?

  • Teradata Financial Services Logical Data Model, also known as the Teradata FSLDM. The Teradata FSLDM is a third normal form; logical data model defined down to the specific attribute level.
  • It can be used as a starting point for financial institutions and insurance companies who are interested in a rapid approach for achieving an organized and integrated view of their business data.
  • The term ‘Financial Services” encompasses securities/investments, insurance, retail banking, Commercial banking, credit card and other financially related products And services.

This data model is used by any business that sells the products above and services, including banks. Insurance Companies, credit card companies, and brokerages.

What are the benefits of using the Teradata FSLDM?

  • It is a flexible and dynamic model that you can use for the long term.
  • This means that as your business changes (e.g. you change your organizational structure or product offerings) there is minimal impact or no impact on the data model. You have a solid database design. Moreover, it is extendable.
  • It jump-starts the data modeling step in the development life cycle. It is a good starting point when developing your logical data model, and provides checkpoints to make sure you have covered the necessary areas in your scope.
  • It is the assimilation of Teradata best-practices knowledge. Gained from many years of designing logical data models for financial institutions.
  • It is the result of analyzing many customer-based data models that Teradata has developed for other industries.
  • It has been validated at customer engagements.
  • It saves time. By using the experience built in this model much less time is needed. Moreover, therefore cost less.
  • It is cross-functional and integrated—has one customer view across all products and functions.
  • It is a roadmap for future integration and enhancement. You can start small but grow into the model knowing that it will integrate in the future.
  • It is broad in scope, robust. Flexible. Moreover, customer-oriented. The model tracks relationships with any party of interest to the financial institutions. It supports a customer management vision—an Environment that enables financial institutions to develop comprehensive knowledge about their customers and to optimize the profitability of each relationship.
  • It is a data warehouse data model—ideal for use as a basis for Customer Relationship Management (CRM).
  • It is third normal form. The design is pure and unconstrained by operational or physical considerations. It can be implemented as near third normal form on Teradata. A third normal form model has one fact in one place—in the right place. A third normal form model has no redundant data elements. (A model with redundant data elements can lead to misinformation.)
  • It is not based on a star schema and hence is not limited to a finite collection of business questions. A third normal form model is flexible in answering unanticipated business questions. With a star schema design. You are limited as to the type of business questions you can ask.
  • It accommodates changes of currency, as it is the case with the Euro.

FSLDM Implementation on Teradata

  • The FS-LDM is implementation on Teradata consists of following Steps / Phases
    • Implementation of the Business Rules in the Logical Model
    • Mapping the Source Data columns to the Destination Data columns
    • Converting the Logical Model into Physical Data Model by Modeling Tools (Erwin)
    • Implementation of the Physical Model into the Database (Object Creation, PI, PPI Decisions, Compression Analysis, and Implementation)
    • Development and implementation of the ETL Jobs on the Tables
    • Application Development / Access of the Data

Implementation of the Business Rules in the Logical Model

  • Usual Methods
    • Discussions with the Data design Architects, business stakeholders and coming up with detailed Business Rules for Implementation
    • Workshops for the Design Etc
  • The Output is the full Business Rules for Implementation and a signoff by involved parties.

A comprehensive Plan of Implementation is Drawn-out by the Design Architects, for the implementations

Creation of Logical Data Modeling

  • The Data Architect will then create the Logical Data model, using the Various LDM
  • The Structures of the Tables, Relationships (Business Rules ), etc. are Logically Designed and signed off for Physical Implementation
  • The Most commonly used tool for this is ER-Win
  • The Logical Data Model is stored on a Common project server; Access restricted to the Data Modeling team

Mapping the Source Data columns to the Destination Data columns

  • This is usually the Task of the Data Modeling Team, in consultation with the LDM, and the Business Users
  • In-depth Discussions with the Business users to Know about the Various Input Sources, and source files, source Columns, etc.
  • The Mapping process also looks into the Destination table structure, decides on the destination table columns Etc
  • Note that the Mapping done is on the LDM, and the Details are Passed on to the ETL Team. None of the Physical Implementation is done at this Stage

Converting the Logical Model into Physical Data Model by Modeling Tools

  • The Logical Data Model is then transformed into the Physical Model
  • The Process usually is done using the Data Modeling Tool (such as ERWin)
  • The Basic Object Definition, along with Primary, and Secondary Keys, Relationships, etc. are defined at this Stage in Physical Model

Implementation of Physical Data Model

  • The DBA’s implements and converts the LDM into a Physical Model.
  • This include
    • Object Creation
    • Object Access
    • Data Integrity Process
    • Primary Key, and Secondary Key Implementation
    • Load Jobs Creation
    • Control Tables implementation
    • Compression Analysis and implementation
    • ETL Jobs
  • Once the Implementation of the Model is Completed the Development Team will then either Re-Create or Develop from Scratch, the ETL Jobs involved in loading the Tables
  • Phase wise implementation are also looked at in a similar fashion

Application Development and Data Access

  • Once the ETL jobs are developed and implemented, application and end users are given access to the data tables, as required by the application teams
  • The reporting team looks at implementation of reports
  • The Tactical Workload Team implements the Required TTW Workload on the Tables

See also:
Teradata Denormalization – Don’t trust your intuition!
First the PDM, then Compression – Measures to reduce Resource Usage
The Primary Index Choice

 

1 2 3
>