Teradata Fastload with Python

DWH Pro Admin

January 24, 2021

minutes reading time

In this article, we assume that you already have basic knowledge about Python. We will show step by step how easy it is to load data using Python and the Teradata SQL Driver for Python using Fastload.

If you are using Windows, we recommend using WinPython if you don’t have Python installed yet. Because there is already a Jupyter Notebook included, we use it to show how easy it is to load data into Teradata.

You can download the latest version of WinPython here:

Download the latest version of WinPython

Below you can see the target table definition:

      PK  INTEGER,
      TheDate DATE FORMAT 'YY/MM/DD',
      TheInteger INTEGER,
      TheDecimal DECIMAL(10,2),

The Required Packages

We need the following packages to access Teradata:

  • teradatasql
  • pandas
  • NumPy
  • faker

The additional packages (pandas, NumPy, faker) are only used to create some fake and random data we can load:

import pandas as pd
import numpy as np
import teradatasql

import faker.providers as f
from faker import Faker
from datetime import datetime

In the next step, we connect to Teradata and open a cursor:

con = teradatasql.connect ('{"host":"","user":"dbc","passsword":"dbc","database":"DWHPRO"}')
cur = con.cursor ()

We make a helper function to have some random dates to insert:

def random_dates(start, end, size):
divide_by = 24 * 60 * 60 * 10**9
start_u = start.value // divide_by
end_u = end.value // divide_by
return pd.to_datetime(np.random.randint(start_u, end_u, size), unit="D").date

Function executing the Fastload

The below function uses a data frame for the random/fake data we want to load.

def fload(size_tran = 100000):
    """Uses Teradata FastLoad method to load data
        sRequest = "DROP TABLE DWHPRO.PythonFastload"  
        print (sRequest)
        cur.execute (sRequest)
    except Exception as ex:
        print ("Ignoring", str (ex).split ("\n") [0])

    sRequest = """CREATE MULTISET TABLE DWHPRO.PythonFastload  ( 
    ,TheDate DATE
    ,TheInteger INTEGER
    ,TheDecimal decimal(10,2)
    ,TheVarChar varchar(20)
    print (sRequest)
    cur.execute (sRequest)

    sRequest = "{fn teradata_nativesql}{fn teradata_autocommit_off}"
    print (sRequest)
    cur.execute (sRequest)

    size = 100000 
    start = datetime.now()
    tran = pd.DataFrame( )
    tran["PK"] = [x+1 for x in range(size_tran)]  
    tran["TheDate"] = random_dates(start=pd.to_datetime('2021-01-01'), end=pd.to_datetime('2021-12-31'), size=size_tran)
    tran["TheInteger"] = np.random.choice([x+1 for x in range(size)] , size =size_tran)
    tran["TheDecimal"]= [ np.random.randint(10, 100000) / 100 for x in range(size_tran)] 
    tran["TheVarChar"]=   np.random.choice([fake.iban()  for _ in range(1000)] , size = size_tran)

    tran_list = tran.values.tolist()
    parameter = ",?"*len(tran_list[0]) 
    parameter = parameter[1:]

    aaoValues= tran.values.tolist()

    sInsert = "{fn teradata_try_fastload}INSERT INTO DWHPRO.PythonFastload ("+parameter+")"

    print (sInsert)
    cur.execute (sInsert, aaoValues)

    print(f"transaction: data generated {datetime.now()-start}") 
    print(f"transaction: data inserted {datetime.now()-start}")

Finally, we call the function and execute the Fastload, close the cursor and the connection to Teradata:

fload(size_tran = 100000)


If you get an error message when playing with this code in Junyper notebook that too many load or unload tasks are active, you can quickly kill these sessions with the below command:

SELECT AbortSessions (1, 'DBC', 0, 'Y', 'N');

The second parameter kills all sessions of this user. If you want to kill a specific session instead, put the sessionid.

  • Can you please same for FastExport using python. Also, what is the approach to extract dml/ddl statement from the teradata using python ?

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like