Hadoop Archives -

The 15-Year Detour: How the Data Industry Spent Billions Reinventing SQL

04/12/202603/04/2026 by Roland Wenzlofsky

Somewhere around 2020, the data world quietly arrived at a conclusion that Teradata engineers could have told you in 1984: SQL on a massively parallel architecture is a pretty good way to process large volumes of data. The path to get there was anything but quiet. It involved billions in capital, an entire generation of …

Introduction to Apache Spark: A Powerful Solution for Big Data Processing and Analytics

03/08/202605/05/2023 by Roland Wenzlofsky

Introduction Processing and analyzing large volumes of data quickly and efficiently is essential in today’s data-driven world. Apache Spark, an open-source big data processing engine, is a leading solution for handling massive datasets that offers a fast and flexible alternative to traditional data processing frameworks like Hadoop’s MapReduce. This article introduces Apache Spark, explores its …

Hadoop and Teradata Data Warehousing: A Comparison and Integration Perspective

04/11/202605/05/2023 by Roland Wenzlofsky

Hadoop is a buzzword in the world of big data, but its actual value can be concealed by the hype. This article compares Teradata and Hadoop Data Warehousing, highlighting the advantages of leveraging Hadoop’s scalability and preprocessing capabilities to improve Teradata’s performance. However, the implementation of Hadoop by big database vendors may not be fully functional, and companies should proceed with caution before adopting new technologies.

Real-World Map-Reduce Implementations: Design and Fault Tolerance

04/11/202605/03/2023 by Roland Wenzlofsky

Here is an illustration depicting the design of real-world map-reduce implementations, such as Hadoop: The input files reside in a distributed file system, such as HDFS for Hadoop, or GFS as Google calls it. Worker processes handle mapper or reducer tasks. Mappers read data from HDFS, apply the mapping function, and save the output to …

A Brief History of Parallel Database Architectures and Their Limitations

04/11/202604/21/2023 by Roland Wenzlofsky

Discover the history of parallel database architectures – from shared memory to shared disk and shared-nothing. Learn about the advantages and limitations of each architecture and how fault tolerance is handled. Explore the shift towards big data and the trend of “Hadoop over SQL.”