2

Teradata Intelligent Memory

There is a lot of arguments brought forward for in-memory-databases. In principle, if we would have unlimited memory, holding all data in memory would allow the fastest access possible. I think there is no doubt about this.

As we all know, there always was and will be a huge difference in access times comparing memory against hard disks or solid state drives (SSD).

While pure in-memory databases like SAP HANA offer the fastest access to the data, Teradata decided to go for the 80-20 rule. It states that only 20% of the data is accessed very frequently, and it is good enough (from a cost perspective) to keep this very frequently accessed data in memory. The less accessed data is made available on slower storages (In reality, it is not exactly 80-20, but you get the idea).

Teradata classifies data in cold, warm, hot and very hot – mainly this is tightly coupled to the data access frequency. The overall architecture is named Teradata Intelligent Memory.

Teradata Intelligent Memory is available starting with Teradata 14.10 and disposable for all systems running on this release.

To understand the concepts behind “data temperature” and the relation to the storage type, we have to understand how data is stored in a Teradata system.

One requirement for the Teradata AMPs is that data blocks have to be made available in memory before any operation on the rows can take place. Therefore since ever, each AMP had its memory (FSG cache). Starting with Teradata 14.10, AMPs have now additionally memory which is used for the very hot data. Data blocks moved into this memory may stay there even for days.

But how Teradata handles the “not-so-hot” data i.e. cold, warm and hot data?

In case solid state drives are available, this would be the next choice. We all know that solid state storage is faster than hard disks.

But even on hard drives, it makes a big difference on which cylinder data blocks are located.

Below you can see the structure of a hard drive. Each circle represents one disk cylinder.

As you may know, data blocks are stored in cylinders. When a drive spins one time, a full cylinder can be read into memory. As you will immediately recognize, much more blocks of data can be placed in the outer (red) cylinder than in the inner (blue) cylinder. Therefore, with one spin, much more blocks from the outer cylinder can be read into memory than from the inner cylinder.

teradata intelligent memory

Teradata takes advantage of this fact and puts the cold data into the inner cylinders while hotter data will be stored in the outer cylinders.

There is a hierarchy in Teradata’s intelligent memory strategy from memory to SSD to disks (while taking into account the different position of cylinders).

While Teradata’s solution probably never can be as fast as real in-memory database solutions, it is a good tradeoff between storage limits (which you have on in-memory databases) and performance.

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Timm+Rüger says:

    Thats more like catching up with multi-temperature data management that DB2 10.1 has introduced in 2012 already.
    Now DB2 with BLU is again a step further it offers true in-memory capability like SAP HANA does. In-memory is more than just storing the records in RAM. It is also about applying intelligent features to reduce the data volume that has to kept in memory resp. pushed through the execution pipelines of the CPUs. Thats why data skipping, columnar storage and sort-order preserving compression algorithms like Huffmann encoding (IBM calls it “actionable compression) and SIMD processing play a critical role in true in-memory databases. Unfortunately Teradata is not yet there.

  • Avatar gaurav says:

    I have some basic questions .
    if you can help that will be really helpful.

    Question 1
    Memory which used by AMP
    Vdisks
    Ram
    Spool Space
    FSG Cache
    Please confirm

    Question 2
    Is FSG Cache memory is same as Spool space where data is moved when Joins takes Place
    Question 3
    Each AMP has its own RAM , So Ram is not FSG Cache memory . It’s memory is fixed in AMP or it is also decided by PDE as same is done for FSG
    Question 4
    Initially, Data is distributed across AMPs so where we store this data on . is its Vdisk or Pdisks . Vdisk is combination of Pdsiks ?
    Question 5
    If Possible can please someone provide image for what inside a AMP ?

  • >