Panoramic Observability: LoongCollector for Large-Scale Apache Flink and Spark Cluster

Runqi Lin

Chinese Session #observability

In today’s era of artificial intelligence + big data, enterprises are facing the challenges of rapid growth and diversified needs of massive data. Combining Apache Flink/Spark, two popular distributed computing engines, enterprises can build flexible real-time and batch data processing pipelines. However, in a large-scale stream processing service cluster environment, observability faces many challenges, mainly including dynamic perception of elastic tasks, large amounts of observable data, and strict requirements for real-time performance. As a full-stack observable data collector, LoongCollector can help users efficiently collect and process logs, metrics, and tracking data. In this talk, we will focus on how to achieve the best practices of enterprise-level observability for Flink/Spark through LoongCollector. Outline: ● Chanllenges in large-scale cluster observability of Apache Flink/Spark ● Detailed explanation of LoongCollector observability capabilities and architecture ● Best practices for observability of LoongCollector in large-scale Apache Flink/Spark clusters ● Future prospects

Speakers:


Worked in the team of Alibaba Cloud Simple Log Service, mainly focusing on observable data collectors, gateway service monitoring, massive data access management.

Personal Links: Github: https://github.com/linrunqi08 Alibaba Cloud Developer Community: https://developer.aliyun.com/profile/xfmif6w7kun52

Previous speaking experience: 2022 QECon “Cloud Service API Observability Construction Practice” https://max.book118.com/html/2022/0824/8076007016004132.shtm