Crunch Data Engineering and Analytics Conference Budapest October 18-20, 2017

Tickets

CRUNCH is a use case heavy conference for people interested in building the finest data driven businesses. No matter the size of your venture or your job description you will find exactly what you need on the two-track CRUNCH conference. A data engineering and a data analytics track will serve diverse business needs and levels of expertise.

If you are a Data Engineer, Data Scientist, Product Manager or simply interested how to utilise data to develop your business, this conference is for you. No matter the size of your company or the volume of your data, come and learn from the Biggest players of Big Data, get inspiration from their practices, from their successes and failures and network with other professionals like you.

18
October
WORKSHOP DAY

Our full-day workshops will be announced soon. You need to buy separate workshop tickets to attend them.

19
October
CONFERENCE DAY #1, THURSDAY

The day will start at 9AM and the last talk will end around 6PM. After the sessions there will be an Crunch party at the conference venue.

20
October
CONFERENCE DAY #2, FRIDAY

The day will start at 9AM and the closing ceremony will end around 6PM.


Speakers

Charles Smith

Charles Smith

Manager - Big Data Platform Architecture, Netflix
Working hard to build an easy data platform at Netflix

Here is a problem: You would like to buy the next great show for Netflix. The dream is that, given your data and a question, you can find the next House of Cards with a click of the mouse. But is that the reality? Why does it seem like data engineers and analysts spend so much time talking about memory requirements and stack traces? This talk will explore the past, present, and some of the future of the Netflix data platform, as well as how we are prioritizing work that will make it easier to focus on data problems rather than the complexities of the platform.

Bio

Charles Smith leads the Big Data Platform Architecture team at Netflix, whose mission is to make using data easy and efficient. He and his team are responsible for envisioning how the data platform allows data scientists to make Netflix's service even better.

Gyula Fóra

Gyula Fóra

Data Warehouse Engineer, King
Real-time analytics at King

This talk gives a technical overview of the different tools and systems we are using at King to process and analyse over 30 billion events in real-time every day.
The core topic of this talk is RBEA (Rule-Based Event Aggregator) , the scalable real-time analytics platform developed by King’s Streaming Platform team. RBEA is a streaming-as-a-service platform built on top of Apache Flink and Kafka which allows developer and data scientists to write analytics scripts in a high level DSL and deploy them on the live event streams in a matter of few clicks.
The distinguishing feature of this platform is that new analytics jobs are not deployed as independent Flink programs, but instead, a fix number of continuously running jobs serve as backends for the RBEA platform. By streaming both the events and new scripts to the backends, scripts share both the incoming data and the state they may build up when analyzing user activity in the games. This design makes new deployments very lightweight and the whole architecture highly efficient without sacreficing expressivity.
We push the Apache Flink framework to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. We will show how we have built a high-level DSL on the abstractions provided by Flink that is more approachable to developers without stream-processing experience and how we use code-generation to execute the programs efficiently at scale.
In addition to our streaming platform we will also introduce other tools that we have developed in order to make deployment and monitoring of real-time applications as simple as possible at scale.

Bio

Gyula is a Data Warehouse Engineer in the Streaming Platform team at King, working hard on shaping the future of real-time data processing. This includes researching, developing and sharing awesome streaming technologies. Gyula grew up in Budapest where he first started working on distributed stream processing and later became a core contributor to the Apache Flink project. Among his everyday funs and challenges, you find endless video game battles, super spicy foods and thinking about stupid bugs at night.
Gyula has been a speaker at numerous big data related conferences and meetups, talking about stream processing technologies and use-cases.

Shirshanka Das

Shirshanka Das

Principal Staff Software Engineer, Linkedin
Bio

Shirshanka is a Principal Staff Software Engineer and the architect for LinkedIn’s Data & Analytics team. He was among the original authors of a variety of open and closed source projects built at LinkedIn, including Databus, Espresso, and Apache Helix. He is currently working with his team on simplifying the big data analytics space at LinkedIn through a multitude of mostly open-source projects: Pinot, a high-performance distributed OLAP engine; Gobblin, a data lifecycle management platform for Hadoop; WhereHows, a data discovery and lineage platform and Dali, a data virtualization layer for Hadoop.

Justin Bozonier

Justin Bozonier

Lead Data Scientist, Finance & Analytics, GrubHub
Science the shit out of your business

The mission of my data science team is to make a science out of our business at GrubHub. We work on understanding how every initiative our company undertakes affects our bottomline. I will discuss how we analyze every feature shipped to production, marketing programs, customer service, and more using a variety of statistical, machine learning, and decision theoretic tools and techniques. Most importantly, I will cover how we have learned to tune these tools, not with just abstract or theoretical scores, but by connecting model error with bottom line impact.

Bio

Justin Bozonier is the author of Test-Driven Machine Learning (published by Packt) and Lead Data Scientist in GrubHub's Financial Planning & Analytics group. The founding data scientist of GrubHub's split testing efforts, his team runs the company's experiment analysis platform, develops experiments and models to tune larger business operations, and data mines experiments and operational data to look for new business opportunities and value existing programs. He has spoken previously at PyData Seattle, Kellogg at Northwestern, PyData Chicago's monthly meetup, and more.
He lives in Lake Villa, IL (just outside the greater Chicago area) with his wife Savannah and soon, their first child. In his spare time he studies math, video game development, and enjoys running.

Evan Miller

Evan Miller

Statistician, Programmer, author of the Wizard statistical analyzer
Learning competitors' secrets with math
Bio

A graduate of Williams College and the University of Chicago, and a recognized name in Silicon Valley for applying math to business problems, Evan Miller works at the intersection of programming, statistics, and visualization techniques. His algorithms for sorting by average rating are in use at some of the most recognizable destinations on the Internet, and his articles on A/B testing are widely read throughout the industry. Evan's current project is Wizard Pro, a desktop statistics program that takes the pain out of predictive modeling.

Sean Kross

Sean Kross

Programmer Analyst, The Johns Hopkins Bloomberg School of Public Health
Lessons from teaching data science to over a million people

My colleges and I saw the demand for data scientists ballooning and we decided to do something about it. In this talk I will explain how the Johns Hopkins Data Science Lab leveraged the latest statistical, computational, and open source methods in order to create over a million new data scientists. We'll talk about what happens as you take data-newbies through their first serious programming experiences, rigorous mathematical training, and the creation of their first data products. We'll discuss the data we collected about how students handle these challenges and how you can take our insights to implement better data science training and understanding in your organization.

Bio

Sean Kross is a PhD student at the University of California San Diego where he studies data science, human-computer interaction, and distributed education. Sean formerly worked in the Johns Hopkins Data Science Lab where he and his colleagues developed The Data Science Specialization on Coursera.org. Sean is the author of Mastering Software Development in R, Developing Data Products, and The Unix Workbench. He blogs less often than he would like at seankross.com and you can find him on Twitter @seankross.

Gio Fernandez-Kincade

Gio Fernandez-Kincade

Co-Founder @ RelatedWorks.io. Formerly Staff Engineer @ Etsy
AI in Production

Read enough Hacker News and you will quickly become convinced that building AI products looks something like:

  1. Fire up Tensore Flow
  2. Choose your favorite network architecture (or better yet, generate one!)
  3. Pipe in tons of data
  4. Profit

That couldn’t be farther from the truth. In this talk, we’ll figure out what it really takes to ship AI products in production.

Bio

Gio has been working with data, architecting systems, and leading teams of engineers for over a decade. He’s currently a co-founder at Related Works, which aims to build simple, intelligent products that help cultural institutions share their collections with the world. Previously he worked as a Staff Engineer at Etsy, where he lead the Search Ranking and Search Experience teams. He focused on Search from the ground up: infrastructure, ranking and machine-learned relevance, diversity, fairness, query understanding, autosuggest, faceting, navigation, experimentation, etc. Prior to working at Etsy, Gio worked at CapitalIQ where he designed, built, and maintained a multi-terabyte database, real-time processing-system, and search engine for globally-sourced financial reports.

Cassandra Jacobs

Cassandra Jacobs

Data Scientist, Stitch Fix
Imposing structure on unstructured text at Stitch Fix

At Stitch Fix, we have a wealth of text data related to each Fix we send out to clients. Fixes contain 5 apparel and non-apparel fashion items, ranging anywhere from blouses to leggings to shoes. Stylists are shown algorithmically scored pieces and ultimately use their own discretion to decide what to send to a client. After they’ve picked everything, stylists write notes detailing the items they selected, and once clients have received their Fix, they leave feedback on the pieces that we sent them. These notes and feedback can be leveraged to learn about our inventory so we can explore what occasions an item is good for or learn features that might not be in the descriptions of an item in our databases that function like a knowledge base. Ultimately we can use this information to make recommendations to stylists about what to write about an item if they’re suffering from writer’s block, automatically make suggestions about what the client might like given a request note they’ve written, or even help stylists find better similar items to ones they are considering sending.

Unfortunately, our text data is largely unstructured – stylists can talk about anything they send and in any order and clients don’t necessarily talk about the item’s prints or fabrics, or occasions that an item is good for. I will discuss a technique I have developed that builds upon a number of existing information extraction methods in natural language processing that allows us to impose structure on these notes and comments. This way we can find out how a stylist talks about an item even if we don’t know where it’s mentioned. The technique results in a network that defines words and items in a common space that we can use to make recommendations about how to talk about an item in a note, or for finding the right item in our inventory.

Bio

Cassandra Jacobs is a data scientist at Stitch Fix. A lover of unstructured data, she works primarily on natural language processing systems for recommendation algorithms, helping expert stylists pick the right pieces to send to clients. After earning her BA in Linguistics at the University of Texas, she earned her PhD in Cognitive Psychology and an MS in Computer Science at the University of Illinois at Urbana-Champaign. In her spare time, she likes to go on backpacking trips, reading literary science fiction, and learning foreign languages.

Thomas in’t Veld

Thomas in’t Veld

Head of Data Science, Peak Labs
Event Driven Growth Hacking

Peak acquired more than 25 million users in two years by combining event analytics with marketing attribution and predictive modelling. In this talk, I will take you on a journey through what makes this tick, how we built it and why it is one of the best ways to grow a new business. Event analytics is the cool new thing used by everyone from Facebook to your second cousin's dog's start-up, but why are so many people doing it wrong? And what will be the next step?
I am a theoretical physicist turned data scientist, and after building shiny data tools for Sky and The Guardian I joined Peak in 2015 to build a data science team. My continuing mission: making sure that every decision at Peak is made with as much data as possible.

Our mission here at Peak is to make lifelong progress enjoyable. We believe there’s always a little room for improvement, and we should strive to better ourselves bit by bit. That’s why we use a combination of neuroscience, technology and fun to get those little grey cells active and striding purposefully towards their full potential. Peak is the number one brain training app on mobile and, since it launched in 2014, has been downloaded more than 25 million times. It has been recognised by both Apple and Google as one of the best apps available, winning Best of 2014, Best of 2015 and Best of 2016 awards as well as Editors’ Choice on both the App Store and Play Store.

Bio

I am a theoretical physicist turned data scientist, and after building shiny data tools for Sky and The Guardian I joined Peak in 2015 to build a data science team. My continuing mission: making sure that every decision at Peak is made with as much data as possible.

Dirk Gorissen

Dirk Gorissen

Senior Engineer, Oxbotica
Beyond Ad-Click Prediction

We all know machine learning is great for helping you tag friends on Facebook, suggesting what brand of toothpaste will improve your smile, and picking the ad most likely to unlock your wallet. In this talk, however, I hope to demonstrate you that there are some interesting applications you may not have thought of. Such as detecting landmines from drone mounted radar, finding orangutans in the Bornean Jungle, or helping a car avoid pedestrians.

Bio

Dirk Gorissen has a background in Computer Science & AI and worked in academic and commercial research labs across Europe and the US. His interests span machine learning, robotics, and computational engineering as well as their application into the humanitarian and development areas. He has been a regular consultant for the World Bank in Tanzania and closely involved with a number of Drone related startups. He currently is a senior engineer in self driving car Oxbotica and on the side is an active STEM Ambassador, and organiser of the London Big-O Algorithms & Machine Learning meetups.

Maxime Beauchemin

Maxime Beauchemin

Data Engineer, Airbnb
Advanced Data Engineering Pattern with Apache Airflow

Analysis automation and analytic services are the future of data engineering! Apache Airflow's DSL makes it natural to build complex DAGs of tasks dynamically, and Airbnb has been leveraging this feature in intricate ways, creating a wide array of services as dynamic workflows. In this talk, we'll explain the mechanics of dynamic pipeline generation using Apache Airflow, and present advanced use cases that have been developed at Airbnb.

Bio

Maxime Beauchemin works at Airbnb as part of the "Analytics & Experimentation Products team", developing open source products that reduce friction and help generating insight from data. He is the creator and a lead maintainer of Apache Airflow [incubating] (a workflow engine), Superset (a data visualization platform), and recognized as a thought leader in the data engineering field. Before Airbnb, Maxime worked at Facebook on computation frameworks powering engagement and growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft.

Melanie Warrick

Melanie Warrick

Senior Developer Advocate, Google
Machine Learning with Containers and Cloud

Machine learning (ML) has gained significant attention because of its impact from advancements in areas like automated medical diagnosis to unique product interactions and advertising for individual users. At its core, it's a set of algorithms used for pattern matching and prediction, and it plays a prominent role in AI development.
When using ML in production, it doesn't happen in a vacuum. That's where containers and cloud systems can help. Containers create isolated environments to easily setup servers and safely run the software. Cloud systems give flexible access to hardware resources without the cost and pain to build it out and maintain it all. This talk will walk through an example of how to implement a machine learning algorithm using containers in the cloud. The goal is to give you an understanding of how the tools work together, and how you can apply these concepts.

Bio

Melanie Warrick is a Senior Developer Advocate at Google. Previous experience includes work as a founding engineer on Deeplearning4J as well as implementing machine learning in production at Change.org. Prior experience also covers business consulting and large enterprise technology implementations for a wide variety of companies. Over the last couple years, she's spoken at many conferences about artificial intelligence, and her passions include working on machine learning problems at scale.

More speakers will be announced soon

If you want to be one of the speakers at Crunch 2017 submit your application via Papercall. Deadline for submission is 15th of May, 2017


Workshops

Workshops will be announced soon


Location

Meet Budapest, a really awesome city

Here are a few reasons why you need to visit Budapest

MAGYAR VASÚTTÖRTÉNETI PARK

BUDAPEST, TATAI ÚT 95, 1142

The Magyar Vasúttörténeti Park (Hungarian Railway History Park) is a railway museum located in Budapest, Hungary at a railway station and workshop of the Hungarian State Railways. Located on the site of the former north depot of the Hungarian State Railway (MÁV), the Hungarian Railway Museum is Europe’s first interactive museum of its kind. The north depot’s roundhouse, home to the museum, was built in 1911 and is also part of Hungarian railway history. There are over a hundred vintage trains, locomotives, cars and other types of railroad equipment on display, including a steam engine built in 1877, a railcar from the 1930’s and a dining car built in 1912 for the famous Orient Express.


Sponsors

Platinum

Gold

Silver

CRUNCH is a non-profit conference. We are looking for sponsors who help us make this conference happen.
Take a look at our sponsor packages and contact us at hello@crunchconf.com


Contact

Crunch Conference is organized by

Ádám Boros
Ádám Boros
Marketing Intern, Prezi
Attila Balogi
Attila Balogi
Event manager, Prezi
Attila Petróczi
Attila Petróczi
R&D and Data Science Manager, Realeyes
Balázs Szakács
Balázs Szakács
Business Intelligence Manager, IBM Budapest Lab
Dániel Molnár
Dániel Molnár
Senior Data & Applied Scientist, Microsoft Deutschland GmbH / Wunderlist Team
Katalin Marosvölgyi
Katalin Marosvölgyi
Travel and accommodation manager, Prezi
Medea Baccifava
Medea Baccifava
Head of conference management, Prezi
Tamás Imre
Tamás Imre
Lead Analyst, Prezi
Tamás Németh
Tamás Németh
Data Engineer, Prezi
Zoé Rimay
Zoé Rimay
Software Developer, Morgan Stanley
Zoltán Prekopcsák
Zoltán Prekopcsák
VP Big Data, RapidMiner
Zoltán Tóth
Zoltán Tóth
Big Data and Hadoop expert, Datapao; Teacher, CEU Business School
Ryan McCabe
Ryan McCabe
Data Analyst, Prezi
Gergely Krasznai
Gergely Krasznai
Data Analyst, Prezi

Questions? Drop us a line at hello@crunchconf.com