Practice of Flink Memory Governance at ByteDance
Yiheng Tang
Chinese Session #streamingAs the demand for streaming tasks continues to grow within ByteDance, Flink has been widely adopted across various business domains at scale. Among the resource costs of these large-scale tasks, memory stands out as a significant contributor—especially heap memory. The total allocated memory for all tasks has reached tens of thousands of terabytes, yet the JVM heap utilization remains below 50%, and container-level memory usage is under 70%. Against the backdrop of company-wide cost reduction and efficiency improvement, we have carried out a series of memory optimizations focused on heap memory usage prediction, off-heap memory usage tracking, and simplifying Flink’s memory model. These efforts have been successfully rolled out across ByteDance, resulting in memory savings of over a thousand terabytes. In this talk, we will present the key joint optimizations led by the Flink and JVM teams at ByteDance, and share the results we’ve achieved.
Agenda:
- Background
- Current memory usage status of Flink at ByteDance
- Key challenges we are facing
- Heap memory usage prediction
- Off-heap memory usage tracking
- Simplification of the Flink memory model: Unified memory pool
- Implementation and benefits
- Future plans
- Exploration of further optimization directions
Speakers:
Flink Runtime Developer at Bytedance