This article assumes prior basic knowledge of Python and demonstrates the easy process of loading data using Python and the Teradata SQL Driver for Python with Fastload.
If you are using Windows, we recommend using WinPython if you don’t have Python installed yet. Because a Jupyter Notebook is included, we use it to show how easy it is to load data into Teradata.
Download the latest WinPython version from here.
Download the latest version of WinPython
The table definition is provided below:
CREATE MULTISET TABLE DWHPRO.PythonFastload
(
PK INTEGER,
TheDate DATE FORMAT 'YY/MM/DD',
TheInteger INTEGER,
TheDecimal DECIMAL(10,2),
TheVarchar VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PK );
The Required Packages
The following packages are necessary to access Teradata:
- teradatasql
- pandas
- NumPy
- faker
The pandas, NumPy, and faker packages serve solely to generate fictitious and haphazard data, which we may subsequently import.
import pandas as pd
import numpy as np
import teradatasql
import faker.providers as f
from faker import Faker
fake=Faker()
from datetime import datetime
Next, we establish a connection to Teradata and initiate a cursor.
con = teradatasql.connect ('{"host":"127.0.0.1","user":"dbc","passsword":"dbc","database":"DWHPRO"}')
cur = con.cursor ()
We’ll create a helper function for generating random dates to insert:
def random_dates(start, end, size):
divide_by = 24 * 60 * 60 * 10**9
start_u = start.value // divide_by
end_u = end.value // divide_by
return pd.to_datetime(np.random.randint(start_u, end_u, size), unit="D").date
Function executing the Fastload
This function utilizes a data frame to load the simulated data.
def fload(size_tran = 100000):
"""Uses Teradata FastLoad method to load data
"""
try:
sRequest = "DROP TABLE DWHPRO.PythonFastload"
print (sRequest)
cur.execute (sRequest)
except Exception as ex:
print ("Ignoring", str (ex).split ("\n") [0])
sRequest = """CREATE MULTISET TABLE DWHPRO.PythonFastload (
PK INTEGER
,TheDate DATE
,TheInteger INTEGER
,TheDecimal decimal(10,2)
,TheVarChar varchar(20)
)
"""#
print (sRequest)
cur.execute (sRequest)
sRequest = "{fn teradata_nativesql}{fn teradata_autocommit_off}"
print (sRequest)
cur.execute (sRequest)
size = 100000
start = datetime.now()
tran = pd.DataFrame( )
tran["PK"] = [x+1 for x in range(size_tran)]
tran["TheDate"] = random_dates(start=pd.to_datetime('2021-01-01'), end=pd.to_datetime('2021-12-31'), size=size_tran)
tran["TheInteger"] = np.random.choice([x+1 for x in range(size)] , size =size_tran)
tran["TheDecimal"]= [ np.random.randint(10, 100000) / 100 for x in range(size_tran)]
tran["TheVarChar"]= np.random.choice([fake.iban() for _ in range(1000)] , size = size_tran)
tran_list = tran.values.tolist()
parameter = ",?"*len(tran_list[0])
parameter = parameter[1:]
aaoValues= tran.values.tolist()
sInsert = "{fn teradata_try_fastload}INSERT INTO DWHPRO.PythonFastload ("+parameter+")"
print (sInsert)
cur.execute (sInsert, aaoValues)
con.commit()
print(f"transaction: data generated {datetime.now()-start}")
print(f"transaction: data inserted {datetime.now()-start}")
We execute the Fastload by calling the function, then close the cursor and terminate the connection to Teradata.
fload(size_tran = 100000)
cur.close()
con.close()
If you encounter an error message in the Jupyter Notebook indicating an excessive amount of active load or unload tasks, you can promptly terminate these sessions using the following command:
SELECT AbortSessions (1, 'DBC', 0, 'Y', 'N');
To terminate a specific session, specify the sessionid parameter. Alternatively, using the second parameter will terminate all active sessions for the user.
Can you please same for FastExport using python. Also, what is the approach to extract dml/ddl statement from the teradata using python ?
Great article!