Optimization and practice of Doris on Paimon in Xiaomi

Wang Long

Chinese Session #olap
  1. How does Doris on Paimon integrate with Xiaomi’s OLAP architecture system and integrate with OLAP multi engines (Spark, Presto) a. Unified SQL gateway proxy access: Implement traffic control, unified authentication, and simplify Doris' access through a unified proxy layer protocol and client SDK b. SQL automatic routing: By analyzing RBO and CBO rules, the most suitable SQL is routed to Doris for execution, achieving the effect of accelerating and reducing costs c. SQL rewriting and syntax compatibility: A unified SQL parsing layer converts incompatible SQL into Doris dialect, solving syntax compatibility issues

  2. Optimization of Doris on Paimon a. Doris supports Paimon aggregation table data merge optimization: Paimon’s data merge on read performance is poor, so pushing its merge operation to Doris skips merge sort and unnecessary merge operations. At the same time, utilizing Doris' large-scale parallel ability for aggregation can greatly improve Paimon’s read speed b. Doris supports Paimon real-time materialized view: supports Paimon changelog level real-time materialized view updates, optimizes business query speed by twice+ c. Snapshot metadata cache optimization: Increase Paimon snapshot scheduled cache refresh capability to solve metadata cache latency and slow metadata retrieval from HDFS

  3. Doris on Paimon’s Landing Effect a. Compared to before optimization, query performance has improved by 5 times

Speakers:


Senior Development Engineer of OLAP Engine Group at Xiaomi, mainly responsible for the kernel development of open source projects Doris, Spark, and Kyuubi