Hive Archives -

The 15-Year Detour: How the Data Industry Spent Billions Reinventing SQL

04/12/202603/04/2026 by Roland Wenzlofsky

Somewhere around 2020, the data world quietly arrived at a conclusion that Teradata engineers could have told you in 1984: SQL on a massively parallel architecture is a pretty good way to process large volumes of data. The path to get there was anything but quiet. It involved billions in capital, an entire generation of …

Introduction to Apache Spark: A Powerful Solution for Big Data Processing and Analytics

03/08/202605/05/2023 by Roland Wenzlofsky

Introduction Processing and analyzing large volumes of data quickly and efficiently is essential in today’s data-driven world. Apache Spark, an open-source big data processing engine, is a leading solution for handling massive datasets that offers a fast and flexible alternative to traditional data processing frameworks like Hadoop’s MapReduce. This article introduces Apache Spark, explores its …

Teradata Rollbacks: Understanding the Impact on Performance and How to Avoid Them

04/11/202604/28/2023 by Roland Wenzlofsky

How to Abort the Teradata Rollback Executing a DML statement on a sizable table may trigger a prolonged ROLLBACK. In such cases, you must choose between waiting for the ROLLBACK to complete or terminating it. Cancelling a rollback avoids wasting additional resources, particularly when the system cannot run in parallel due to high skew, which …