Practice of Flink Memory Governance at ByteDance

Yiheng Tang, Shaojun Wang

Chinese Session 2025-07-26 14:30 GMT+8 (ROOM : YuanMing Hall) #streaming

As the demand for streaming tasks continues to grow within ByteDance, Flink has been widely adopted across various business domains at scale. Among the resource costs of these large-scale tasks, memory stands out as a significant contributor—especially heap memory. The total allocated memory for all tasks has reached tens of thousands of terabytes, yet the JVM heap utilization remains below 50%, and container-level memory usage is under 70%. Against the backdrop of company-wide cost reduction and efficiency improvement, we have carried out a series of memory optimizations focused on heap memory usage prediction, off-heap memory usage tracking, and simplifying Flink’s memory model. These efforts have been successfully rolled out across ByteDance, resulting in memory savings of over a thousand terabytes. In this talk, we will present the key joint optimizations led by the Flink and JVM teams at ByteDance, and share the results we’ve achieved.

Agenda:

Background

Current memory usage status of Flink at ByteDance
Key challenges we are facing

Heap memory usage prediction
Off-heap memory usage tracking
Simplification of the Flink memory model: Unified memory pool
Implementation and benefits
Future plans

Exploration of further optimization directions

Speakers:

Yiheng Tang: Bytedance Infrastructure Engineer

Flink Runtime Developer at Bytedance

Shaojun Wang: ByteDance, China, Programming Language Engineer

The PPMC of the Apache incubating project teaclave
A Gopher since 2017
A Programming Language Engineer @ByteDance