Optimizing Performance With Teradata ORDER BY: How Sorting Is Done In Parallel Without Bottlenecks

Teradata ORDER BY and Performance

To employ a basic sorting algorithm, all rows must be present in one location for sorting. However, this is not feasible in Teradata, where numerous AMPs each retain a segment of every table. Transporting all the rows to one AMP for sorting would result in a non-scalable and bottlenecked process.

Teradata optimizes the shared-nothing architecture, performing parallel sorting at each level without redistributing rows. Sorting constitutes the final step in the execution plan.

Here we present the algorithm employed in the ORDER BY statement of Teradata:

First, the table rows are sorted on each AMP where they reside. A higher number of AMPs permits higher sort performance, as all AMPs are sorted in parallel. The sorted rows are put into a spool table.
After AMPs finish their local sort step, Teradata will return the local spools to the requesting client. For performance reasons, Teradata creates a global buffer per node and on the parsing engine level and informs the client that the result set is available.
The client fetches as many rows as needed (recall that we often abort requests after having a specific number of rows available in SQL Assistant).
Each fetch request causes each AMP to move its top row into the buffer, which is merged globally in sort order. Subsequently, each AMP puts its next row into the buffer where the sort order merges them. This process continues until the global buffer is full and sent via the attached network to the client. The merge process itself is handled by the BYNET software (which manages the merge buffer on its own).
The above-described process continues as long as the client fetches rows from the result set, or until no more rows are available.

Teradata uses a sorting algorithm to arrange data in a specific order.

This sorting algorithm’s exceptional performance comes from the following qualities:

The AMP local pre-sorting step is done in parallel on all AMPs, and sort performance increases with the system size (good scalability).
Teradata must sort only the client-requested rows in the last global merge step. Usually, when the client requests the whole table (such as in SQL Assistant), we cancel the request after having a small number of rows on our screen. It would not make sense to perform the global sort for millions of rows that the user will never see.

Related Services

🏗️ Planning a Data Platform Migration?

Architecture-first approach: we design before a single line of code is written. Zero data loss across every migration delivered.

Our Migration Services →

3 thoughts on “Optimizing Performance with Teradata ORDER BY: How Sorting is Done in Parallel Without Bottlenecks”

BYNET will merge-sort only the rows which are fetched by the client. No sort of all rows is required in advance.

Nicely Explained.
One Question: generally in SQL assistant fetch row limit is set to 2000 or let say we mention SAMPLE 2000 unless you ask for full fetch.

So in that case when we are fetching 2000 rows, BYNET will sort the whole result or will give only first 2000 rows from ALL AMPS in node

Roland Wenzlofsky

06/30/2022 at 12:34 pm

Only the 2000 rows are sorted in the global buffer.
Reply

Optimizing Performance with Teradata ORDER BY: How Sorting is Done in Parallel Without Bottlenecks

Teradata ORDER BY and Performance

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

3 thoughts on “Optimizing Performance with Teradata ORDER BY: How Sorting is Done in Parallel Without Bottlenecks”

Leave a Comment Cancel reply