John is a co-founder & tech lead at Acryl Data, the company behind the open source metadata platform DataHub. Prior to Acryl, John was a senior engineer at LinkedIn, where he worked on systems to detect and prevent large-scale abuse like spam, scraping, & fake accounts. In his spare time, he enjoys hiking, cooking & reading.
Central Data Engineering and Data Platform teams face the unique challenge of bearing responsibility for the storage, transformation, and movement of data within an organization, without necessarily being responsible for (or knowledgeable about) the relevance, quality, or intended use of that data.
Introducing a central data catalog helps data consumers to discover what data exists, what it represents, and how it is meant to be used; unfortunately, those details rarely come for free and often require a coordinated governance initiative.
So, now that you’ve cataloged tens of thousands of data entities from multiple systems, tools, and platforms, how should you prioritize which assets are worth documenting, categorizing, annotating, and maintaining indefinitely?
In this talk, we will present a practical approach to supercharging your data governance initiative by surfacing and leveraging metadata readily available within your data ecosystem. We’ll take a close look at which metadata elements serve as meaningful signals so you can cut through the noise and govern the assets that matter most. You’ll walk away from this talk with clear next steps about how to manage the health and governance of your ever-growing data stack.