Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.
We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.
Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions. Builders can use the tools they know -- e.g. Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and the application can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution -- in a pluggable fashion.
The result is more reliable, testable, understandable data systems, that leverage the existing tools that work and that are deployable to your infrastructure.