Crunch Data Conference

Piotr Findeisen

Co-founder at Starburst

Bio:

Piotr is a Software Engineer at Starburst and member of the company founding team. He contributes to Presto code base and is also active in the community. He has been involved in significant features like cost-based optimizer, spill to disk, correlated subqueries and plethora of smaller enhancements. Before Starburst, Piotr worked at Teradata and became top external Presto committer of the year. Prior to that, he was a Team Leader at Syncron (provider cloud services for supply chain management), responsible for product technical foundation and performance. Piotr holds M.S. in Computer Science (and B.Sc. in Mathematics) from University of Warsaw.

Workshop:

Presto: SQL-on-Anything, hands-on workshop

Topics:

data engineering

SQL

ETL

data warehouse

analytics

Level:

Beginner

Description

Presto has become the ubiquitous open source software for SQL on anything. Presto is heavily used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and many others for low-latency querying large amounts of data, wherever it resides (Hadoop, AWS S3, Cassandra, Postgres, etc). Presto was engineered from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs.

Join Wojciech Biela for this full-day workshop to learn about Presto’s concepts, architecture and explore its many use cases and best practices you can implement today. Learn how to setup and use Presto through various hands-on exercises (those who don’t want to participate in the exercises can follow along).

Target audience

Roles: data engineers, data architects, software engineers, and those in IT

Prerequisite knowledge

A basic understanding of SQL, databases, Hadoop, and distributed systems.
Basic command line (Bash) skills.

Materials or downloads needed in advance

A laptop with a browser.

Agenda

Rough outline of the training, including slides and labs (hands-on exercises):

Presto architecture and technical concepts
Lab 1 - Manual Presto deployment
Presto query execution
Presto Ecosystem, Connectors and Connectivity
Migrating from Hive
Administering Presto
Presto in cloud environments
Lab 2 - Query S3 Data using Presto
Lab 3 - Query PostgreSQL using Presto
Lab 4 - Query Federation using Presto
Instructor lab demonstrations:

Lab 5 - Using Presto w/ AWS Glue Data Catalog
Lab 6 - Scaling Presto on AWS

Lab 7 - Presto and BI tools (connecting from Superset)
Query Performance, Cost-Based Optimizer
Lab 8 - Cost-Based Optimizer in Action
Security in Presto
Joining the Presto community