Teradata ORDER BY and Performance
If we chose a simple sort method, all rows would have to be available in one place to be sorted. This is, of course, not possible in Teradata, as hundreds of AMPs hold a portion of each table. Copying all rows to one AMP to sort them would be a non-scalable operation and a bottleneck.
Teradata makes optimal use of the shared-nothing architecture, and sorting is done in parallel at each level without having to redistribute rows. Sorting is always the last step in the execution plan.
Below we describe the algorithm that is used in the Teradata ORDER BY statement:
- At first, the table rows are sorted on each AMP where they reside. A higher number of AMPs permits a higher sort performance as all AMPs are sorting in parallel. The sorted rows are put into a spool table.
- After AMPs finish their local sort step, Teradata will return the local spools to the requesting client. For performance reasons, Teradata creates a global buffer per node and on the parsing engine level and informs the client that the result set is available.
- The client fetches as many rows as needed (recall that we often abort requests after having a specific number of rows available in SQL Assistant).
- Each fetch request causes each AMP to move its top row into the buffer, where they are merged globally in sort order. Subsequently, each AMP puts its next row into the buffer where they are merged by the sort order. This process continues until the global buffer is full and sent via the attached network to the client. The merge process itself is done by the BYNET software (which handles the merge buffer on its own)
- The above-described process continues as long as the client fetches rows from the result set or no more rows are available.
The excellent performance of this sorting algorithm results from the following properties:
- The AMP local pre-sorting step is done in parallel on all AMPs, and sort performance increases with the system size (good scalability)
- Teradata must sort only the client-requested rows in the last global merge step.
Usually, when the whole table is requested from the client (such as SQL Assistant), we cancel the request after having a small number of rows on our screen. It would not make sense to do the global sort for millions of rows that never are noticed by the user.