Crunch Data Conference

Patrick Hall

Senior Director of Product at H2O.ai

Practical Techniques for Interpretable Machine Learning

Topics:

machine learning

data science

prediction

interpretability

h2o

python

Level:

Intermediate

Transparency, auditability, and stability of predictive models and results are typically key differentiators in effective machine learning applications. Patrick will share tips and techniques learned through implementing interpretable machine learning solutions in industries like financial services, telecom, and health insurance. Using a set of publicly available and highly annotated examples, he teaches several holistic approaches to interpretable machine learning. The examples use the well-known University of California Irvine (UCI) credit card dataset and popular open source packages to train constrained, interpretable machine learning models and visualize, explain, and test more complex machine learning models in the context of an example credit-risk application. Along the way, Patrick draws on his applied experience to highlight crucial success factors and common pitfalls not typically discussed in blog posts and open source software documentation, such as the importance of both local and global explanation and the approximate nature of nearly all machine learning explanation techniques.

Who is this presentation for?

Researchers, scientists, data analysts, predictive modelers, business users and other professionals, and anyone else who uses or consumes machine learning techniques

Prerequisite knowledge

A working knowledge of Python, widely used linear modeling approaches, and machine learning algorithms.

Materials or downloads needed in advance

A laptop with a recent version of the Firefox or Chrome browser installed. (This tutorial will use an Aquarium environment.) As a backup, tutorial materials are available on GitHub: https://github.com/jphall663/interpretable_machine_learning_with_python

What you'll learn

The audience will learn several practical machine learning interpretability techniques and how to use them with Python. They will also learn the best way to use these techniques and common pitfalls to avoid when applying them.

Get Workshop Tickets

Zoltan C. Toth

CTO at Datapao

Hands on Deep Learning with Keras, TensorFlow, and Apache Spark™ (Official Databricks Workshop)

Topics:

deep learning

data science

spark

tensorflow

keras

python

Level:

Intermediate

This course offers a thorough, hands-on overview of deep learning and its integration with Apache Spark.
This course covers the fundamentals of neural networks and how to build distributed TensorFlow models on top of Spark DataFrames. Throughout the class, you will use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models. This course is taught entirely in Python.

Objectives

Upon completion, students will be able to:
Build a neural network with Keras
Explain the difference between various activation functions and optimizers
Track experiments with MLflow
Apply models at scale with Deep Learning Pipelines
Perform transfer learningBuild distributed models with Horovod

Audience

Primarily directed towards the practicing data scientist who is eager to get started with deep learning and its integration with Apache Spark

Prerequisites

Python (numpy and pandas)
Apache Spark™ for Machine Learning and Data Science or equivalent experience

Get Workshop Tickets

Elena Verna

Growth Advisor (prev. SurveyMonkey, Malwarebytes)

Why, What, How, and When behind growth teams (Half day workshop)

Topics:

growth

metrics

organisation

success

Level:

General

In this workshop we will cover why so many companies decide to take a plunge to start a Growth team & invest into Growth mindset. Let's brainstorm if your business knows what Growth means for you (acquisition, conversion, engagement, retention?) - it is all about the data! How to understand which metrics are doing well vs. require improvement. What Growth means within marketing and product teams. What does Growth team mean for your organizational structure? How to ensure Growth team is successful - leadership buy in, data availability, alignment. Most importantly, when it is the right time to start it (or not).

Get Workshop Tickets

Piotr Findeisen

Co-founder at Starburst

Wojciech Biela

Co-founder at Starburst

Presto: SQL-on-Anything, hands-on workshop

Topics:

data engineering

SQL

ETL

data warehouse

analytics

Level:

Beginner

Description

Presto has become the ubiquitous open source software for SQL on anything. Presto is heavily used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and many others for low-latency querying large amounts of data, wherever it resides (Hadoop, AWS S3, Cassandra, Postgres, etc). Presto was engineered from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs.

Join Wojciech Biela for this full-day workshop to learn about Presto’s concepts, architecture and explore its many use cases and best practices you can implement today. Learn how to setup and use Presto through various hands-on exercises (those who don’t want to participate in the exercises can follow along).

Target audience

Roles: data engineers, data architects, software engineers, and those in IT

Prerequisite knowledge

A basic understanding of SQL, databases, Hadoop, and distributed systems.
Basic command line (Bash) skills.

Materials or downloads needed in advance

A laptop with a browser.

Agenda

Rough outline of the training, including slides and labs (hands-on exercises):

Presto architecture and technical concepts
Lab 1 - Manual Presto deployment
Presto query execution
Presto Ecosystem, Connectors and Connectivity
Migrating from Hive
Administering Presto
Presto in cloud environments
Lab 2 - Query S3 Data using Presto
Lab 3 - Query PostgreSQL using Presto
Lab 4 - Query Federation using Presto
Instructor lab demonstrations:

Lab 5 - Using Presto w/ AWS Glue Data Catalog
Lab 6 - Scaling Presto on AWS

Lab 7 - Presto and BI tools (connecting from Superset)
Query Performance, Cost-Based Optimizer
Lab 8 - Cost-Based Optimizer in Action
Security in Presto
Joining the Presto community

Get Workshop Tickets

András Fülöp

Solutions Architect at Datapao

Apache Spark™ Overview (Official Databricks Workshop)

Topics:

data-engineering

big data

open source

etl

sql

streaming

Level:

Beginner

This 1-day course is for data engineers, analysts, architects, data scientist, software engineers, IT operations, and technical managers interested in a brief hands-on overview of Apache Spark.

The course provides an introduction to the Spark architecture, some of the core APIs for using Spark, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. The class is a mixture of lecture and hands-on labs.

Each topic includes lecture content along with hands-on labs in the Databricks notebook environment. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering after the class ends; all examples are guaranteed to run in that environment.

Get Workshop Tickets