Apache Spark Archives -

You Migrated From Teradata to Spark and Threw Away the One Thing That Made It Fast

04/11/202603/08/2026 by Roland Wenzlofsky

If you have spent any amount of time working with Teradata, you know that the Primary Index is one of the most important design decisions you make. It determines how data is distributed across AMPs and whether your joins are fast or slow. Choosing the wrong Primary Index is one of the most common causes …

The 15-Year Detour: How the Data Industry Spent Billions Reinventing SQL

04/12/202603/04/2026 by Roland Wenzlofsky

Somewhere around 2020, the data world quietly arrived at a conclusion that Teradata engineers could have told you in 1984: SQL on a massively parallel architecture is a pretty good way to process large volumes of data. The path to get there was anything but quiet. It involved billions in capital, an entire generation of …

Introduction to Apache Spark: A Powerful Solution for Big Data Processing and Analytics

03/08/202605/05/2023 by Roland Wenzlofsky

Introduction Processing and analyzing large volumes of data quickly and efficiently is essential in today’s data-driven world. Apache Spark, an open-source big data processing engine, is a leading solution for handling massive datasets that offers a fast and flexible alternative to traditional data processing frameworks like Hadoop’s MapReduce. This article introduces Apache Spark, explores its …