Data Warehouse Virtualization Technology Based on Apache Calcite

Jiajun Xie

Chinese Session #olap

In the construction of data warehouses, there are increasingly more indicators and dimensions, maintenance costs are becoming higher, and storage resource pressure is increasing. How to manage indicators and reduce the maintenance cost of indicators? How to design a data warehouse model to reduce storage costs? In order to solve these problems, Douyin Group Data Platform Team has built a complete set of “Data Warehouse Virtualization” solutions based on Apache Calcite and Apache Hive, including the following technologies: Virtual columns and virtual associated columns SQL Define Function and Parameterized View Virtual partition (view of partition) The combination of these abilities not only facilitates the management of data analysis metrics, but also helps reduce storage costs Specific typical cases and implementation principles will be introduced in the presentation PPT.

Speakers:


2022、2023、2024年 ApacheCon Asia 讲师 Apache Calcite Committer