Dive into Vectorized Execution for Apache Cloudberry: Design, Challenges, and Performance Gains

Zhang Yue

Chinese Session 2025-07-25 16:15 GMT+8  (ROOM : Mtn BaiWang Hall) #olap

As analytical workloads grow in both scale and complexity, the demand for high-performance data processing engines continues to rise. While MPP architectures are effective at scaling out performance across hardware, databases built on PostgreSQL — such as Greenplum and Apache Cloudberry — face limitations due to PostgreSQL’s execution engine.

To overcome these constraints, we introduce a vectorized execution engine for Apache Cloudberry, which is designed to unlock greater efficiency through batch processing and low-level instruction optimizations. In this session, we will take a deep dive into the design and implementation of Cloudberry’s vectorized engine solution, outline the key engineering efforts behind it, and share insights from real-world use cases—including performance benchmarks, bottlenecks we encountered, and future directions for further optimization.

Speakers:


Zhang Yue: HashData Corporation, Software Engineer

Software Engineer in HashData Corporation.