Catalogs as Context: Using metadata to power and govern the next wave of AI development

Lisa N. Cao

English Session #ai

Developing powerful AI tooling has been our theme of the year, with agents and foundational models picking up steam across the board. Therein still lies the question though: how do we serve data for these applications to work effectively? What about at enterprise scale? What even is context? In this talk we discuss the current big data landscape, challenges to data platforming for AI, and why data catalogues and metadata are the only viable path forward to effective, governed AI-development. In this talk we use the open source framework, Apache Gravitino as a key example for why such a solution needs vendor neutrality.

Speakers:

Lisa is a data engineer, product manager, and speaker in open source data infrastructure and DataOps fields. Through her work at Datastrato, creators of Apache Gravitino, she is redefining the data cataloging space for generative AI use cases and end-to-end data integrations. She currently serves on the Linux Foundation’s Outreach committee, leads the Open Platform for Enterprise AI’s (OPEA) Developer Experience Working Group, and leads the Continuous Delivery Foundation’s (CDF) DataOps Initiative.

Lisa is also a Google Women Techmakers Ambassador, founder and 3x chair of the Vancouver Datajam, and former lead maintainer of the BiocSwirl project. She is also a Terry Fox Gold Medal award recipient (2021) and Linux Foundation LiFT recipient for Women in Open Source (2021). Some meetups she has organized included SF’s Data for AI meetup, Data Engineer Things Bay Area, and RLadies Vancouver.