In this article, we assume that you already have basic knowledge about Python. We will show step by step how easy it is to load data using Python and the Teradata SQL Driver for Python using Fastload.
If you are using Windows, we recommend using WinPython if you don’t have Python installed yet. Because there is already Jupyter Notebook included, we use it to show how easy it is to load data into Teradata.
You can download the latest version of WinPython here:
Download latest version of WinPython
Below you can see the target table definition:
CREATE MULTISET TABLE DWHPRO.PythonFastload
(
PK INTEGER,
TheDate DATE FORMAT 'YY/MM/DD',
TheInteger INTEGER,
TheDecimal DECIMAL(10,2),
TheVarchar VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PK );
The Required Packages
We need the following packages to access Teradata:
- teradatasql
- pandas
- numpy
- faker
The additional packages (pandas, numpy, faker) are only used to create some fake and random data we can load:
import pandas as pd
import numpy as np
import teradatasql
import faker.providers as f
from faker import Faker
fake=Faker()
from datetime import datetime
In the next step we connect to Teradata and open a cursor:
con = teradatasql.connect ('{"host":"127.0.0.1","user":"dbc","passsword":"dbc","database":"DWHPRO"}')
cur = con.cursor ()
We create a helper function to have some random dates to insert:
def random_dates(start, end, size):
divide_by = 24 * 60 * 60 * 10**9
start_u = start.value // divide_by
end_u = end.value // divide_by
return pd.to_datetime(np.random.randint(start_u, end_u, size), unit="D").date
Function executing the Fastload
Below function uses a data frame for the random/fake data we want to load.
def fload(size_tran = 100000):
"""Uses Teradata FastLoad method to load data
"""
try:
sRequest = "DROP TABLE DWHPRO.PythonFastload"
print (sRequest)
cur.execute (sRequest)
except Exception as ex:
print ("Ignoring", str (ex).split ("\n") [0])
sRequest = """CREATE MULTISET TABLE DWHPRO.PythonFastload (
PK INTEGER
,TheDate DATE
,TheInteger INTEGER
,TheDecimal decimal(10,2)
,TheVarChar varchar(20)
)
"""#
print (sRequest)
cur.execute (sRequest)
sRequest = "{fn teradata_nativesql}{fn teradata_autocommit_off}"
print (sRequest)
cur.execute (sRequest)
size = 100000
start = datetime.now()
tran = pd.DataFrame( )
tran["PK"] = [x+1 for x in range(size_tran)]
tran["TheDate"] = random_dates(start=pd.to_datetime('2021-01-01'), end=pd.to_datetime('2021-12-31'), size=size_tran)
tran["TheInteger"] = np.random.choice([x+1 for x in range(size)] , size =size_tran)
tran["TheDecimal"]= [ np.random.randint(10, 100000) / 100 for x in range(size_tran)]
tran["TheVarChar"]= np.random.choice([fake.iban() for _ in range(1000)] , size = size_tran)
tran_list = tran.values.tolist()
parameter = ",?"*len(tran_list[0])
parameter = parameter[1:]
aaoValues= tran.values.tolist()
sInsert = "{fn teradata_try_fastload}INSERT INTO DWHPRO.PythonFastload ("+parameter+")"
print (sInsert)
cur.execute (sInsert, aaoValues)
con.commit()
print(f"transaction: data generated {datetime.now()-start}")
print(f"transaction: data inserted {datetime.now()-start}")
Finally, we call the function and execute the Fastload, close the cursor, and the connection to Teradata:
fload(size_tran = 100000)
cur.close()
con.close()
If you get an error message when playing with this code in Junyper notebook that too many load or unload tasks are active, you can easily kill these sessions with the below command:
SELECT AbortSessions (1, 'DBC', 0, 'Y', 'N');
The second parameter kills all sessions of this user. If you want to kill a specific session instead, put there the sessionid.
Can you please same for FastExport using python. Also, what is the approach to extract dml/ddl statement from the teradata using python ?
Great article!