Practice of Flink Memory Governance at ByteDance
Yiheng Tang, Shaojun Wang
Chinese Session 2025-07-26 14:30 GMT+8 (ROOM : YuanMing Hall) #streamingAs the demand for streaming tasks continues to grow within ByteDance, Flink has been widely adopted across various business domains at scale. Among the resource costs of these large-scale tasks, memory stands out as a significant contributor—especially heap memory. The total allocated memory for all tasks has reached tens of thousands of terabytes, yet the JVM heap utilization remains below 50%, and container-level memory usage is under 70%. Against the backdrop of company-wide cost reduction and efficiency improvement, we have carried out a series of memory optimizations focused on heap memory usage prediction, off-heap memory usage tracking, and simplifying Flink’s memory model. These efforts have been successfully rolled out across ByteDance, resulting in memory savings of over a thousand terabytes. In this talk, we will present the key joint optimizations led by the Flink and JVM teams at ByteDance, and share the results we’ve achieved.
Agenda:
- Background
- Current memory usage status of Flink at ByteDance
- Key challenges we are facing
- Heap memory usage prediction
- Off-heap memory usage tracking
- Simplification of the Flink memory model: Unified memory pool
- Implementation and benefits
- Future plans
- Exploration of further optimization directions
Speakers:
Yiheng Tang: Bytedance Infrastructure Engineer
Flink Runtime Developer at Bytedance
Shaojun Wang: ByteDance, China, Programming Language Engineer
- The PPMC of the Apache incubating project teaclave
- A Gopher since 2017
- A Programming Language Engineer @ByteDance