Crunch Data Conference

Rich Caruana

Senior Principal Researcher at Microsoft Research

Bio:

Rich Caruana is a senior principal researcher at Microsoft Research. Before joining Microsoft, Rich was on the faculty in the Computer Science Department at Cornell University, at UCLA’s Medical School, and at CMU’s Center for Learning and Discovery. Rich’s Ph.D. is from Carnegie Mellon University, where he worked with Tom Mitchell and Herb Simon. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 (for Meta Clustering), best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), co-chaired KDD in 2007 (with Xindong Wu), and serves as area chair for NIPS, ICML, and KDD. His current research focus is on learning for medical decision making, transparent modeling, deep learning, and computational ecology.

Talk:

Interpretable and Differentially Private Machine Learning: Don’t Practice Data Science in Healthcare Without It

Topics:

Level:

General

In machine learning often tradeoffs must be made between accuracy, privacy and intelligibility: the most accurate models usually are not very intelligible or private, and the most intelligible models usually are less accurate. This can limit the accuracy of models that can safely be deployed in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust models is important. EBMs (Explainable Boosting Machines) are a recent learning method based on generalized additive models (GAMs) that are as accurate as full complexity models, more intelligible than linear models, and which can be made differentially private with little loss in accuracy. EBMs make it easy to understand what a model has learned and to edit the model when it learns inappropriate things. In the talk I’ll present case studies where EBMs discover surprising patterns in data that would have made deploying black-box models risky. I’ll also show how DP-EBMs allow us to train models that are differentially private without sacrificing intelligibility or editability. And if there’s time, I’ll show how we’re using these models to uncover and mitigate bias in models where fairness and transparency are important. Every data set is flawed in surprising ways --- you need intelligibility to uncover the hidden secrets and find the gold.