Apache Gravitino (incubating), the answer of metadata management in AI era

Xiaojing Fang

Chinese Session #ai

Metadata management has become a cornerstone in the AI era. This talk will explore how Apache Gravitino enables the management of unstructured data and models at scale, along with Xiaomi’s real-world implementations of leveraging Gravitino for large language model (LLM) data processing and model lifecycle management.

Outline:

  1. The challenges of dataset and model management in AI workflows and how Gravitino addresses these through its Fileset Catalog for structured AI dataset governance and Model Catalog for unified model lifecycle management.
  2. Leveraging Gravitino’s tagging system, lineage tracking, and credential vending capabilities to maximize operational efficiency and governance compliance.
  3. The Practice of Fileset in Xiaomi’s Data Processing: In AI scenarios, data processing involves multiple stages such as downloading, extraction, filtering, deduplication, and training. Leveraging Fileset enhances data pipeline efficiency between Data and AI engines. Additionally, it enables end-to-end dataset management and establishes a unified metadata view.
  4. Xiaomi’s AI Large Model Management: How Xiaomi manages large model metadata, deploys model services, and our future plan for integration with Gravitino.

Speakers:


Apache Gravitino PPMC, interested in with data&AI infra systems.