Crunch Data Engineering and Analytics Conference Budapest October 5-7, 2016

CRUNCH is a use case heavy conference for people interested in building the finest data driven businesses. No matter the size of your venture or your job description you will find exactly what you need on the two-track CRUNCH conference. A data engineering and a data analytics track will serve diverse business needs and levels of expertise.

Workshop day: October 5
Conference days: October 6-7

If you are a Data Engineer, Data Scientist, Product Manager or simply interested how to utilise data to develop your business, this conference is for you. No matter the size of your company or the volume of your data, come and learn from the Biggest players of Big Data, get inspiration from their practices, from their successes and failures and network with other professionals like you.

Here's a short video overview of Crunch 2015:


Speakers

Alex Dean

Alex Dean

Co-founder, SnowPlow
Ben Yoskovitz

Ben Yoskovitz

Founding Partner, Highline BETA

Product Management: Data + Guts

Bio

Benjamin Yoskovitz is an entrepreneur, investor and author. He recently launched Highline BETA, a startup co-creation company. Previously he was VP Product at VarageSale and GoInstant (acq. $CRM). He’s made 15+ angel investments, and founded an accelerator, Year One Labs. Ben is the co-author of Lean Analytics (published by O’Reilly), a book that combines Lean Startups and analytics to help startups and large companies build better businesses and products faster. He’s an active blogger at http://instigatorblog.com. You can also find Ben on Twitter @byosko.

Casey Stella

Casey Stella

Principal Architect, HortonWorks

Data Preparation for Data Science: A Field Guide

Any data scientist who works with real data will tell you that the hardest part of any data science task is the data preparation. Everything from cleaning dirty data to understanding where your data is missing and how your data is shaped, the care and feeding of your data is a prime task for the working data scientist.

I will describe my experiences in the field and present an open source utility written with Apache Spark to automate some of the necessary but insufficient things that I do every time I'm presented new data. In particular, we'll talk about discovering missing values, values with skewed distributions and discovering likely errors within your data.

Bio

I am a committer and PMC member on the Apache Metron project in the engineering team at Hortonworks. In the past, I've worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist in the Oil & Gas industry.

I specialize in writing software and solving problems where there are either scalability concerns due to large amounts of traffic or large amounts of data. I have a particular passion for data science problems or any thing mathematical.

Crunch Speaker

Dan McKinley

Bio

After starting his career in finance, Dan McKinley freaked out and moved to Brooklyn. He stumbled into a fledgling Etsy.com in 2007, and spent his first years there trying to stop overwhelming traffic from reducing the site to its constituent elements. In the long summer that followed he worked on activity feeds, search, recommendations, experimentation, and analytics. Dan worked at Stripe for a while, before moving on to co-found Skyliner.io along with Coda Hale and Marc Hedlund.

Danny Yuan

Danny Yuan

Software Engineer, Uber

Realtime Stream Processing @Uber

This talk will discuss how stream processing is used in Uber's realtime system to solve a wide range of problems, including but not limited to revealing and visualizing dynamics of Uber's marketplace, performing complex computation on geospatial temporal data, and extracting patterns from data streams. This talk will also present the architecture of the stream processing pipeline with a focus on how and why the architecture has evolved into its current form.

Bio

Danny Yuan is a software engineer in Uber. He's currently working on data systems for Uber's logistics platform. Prior to joining Uber, he worked on building Netflix's cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix's low-latency crypto services.

Dirk Duellmann

Dirk Duellmann

Section Leader Analysis and Design, CERN

Understanding the computing for the Large Hadron Collider at CERN

The physics community at CERN analyses since many decades large volumes of physics data.
More recently statistical methods and machine learning are also applied to computing infrastructure metrics to better understand and optimise the complex and distributed computing systems used for the Large Hadron Collider.
This presentation will give an overview of established and new techniques and tools for supporting these analysis activities.

Bio

Dirk Duellmann leads the Analytics and Development section of CERN's Storage group. He is responsible for the design and evolution of CERN's high performance disk pools for physics data analysis and he chairs the working group for Infrastructure Analytics of CERN's IT department. Previously he lead the Worldwide LHC Computing Grid (WLCG) projects for persistency framework development and for distributed database deployment.

Dirk joined CERN in 1995 after receiving a PhD in High Energy Physics from the University of Hamburg. Before he worked in several software companies on the development of database management systems and applications.

Elena Verna

Elena Verna

VP of Growth and Analytics, SurveyMonkey

Pricing Page Optimization

Pricing page belongs to one of the most important funnels on your site that should be closely monitored and optimized. Learn what you need to know about the user behavior on the pricing page to know where to focus your AB testing resources.

Bio

Elena Verna is the VP of Growth and Analytics at SurveyMonkey. She leads both the Analytics and Growth teams where she focuses on understanding user behavior and driving monetization through funnel optimizations and user targeting.

Jeroen Janssens

Jeroen Janssens

Assistant Professor of Data Science, Tilburg University

The Polyglot Data Scientist

Bio

Jeroen Janssens is an assistant professor of data science at Tilburg University. As an independent consultant and trainer, Jeroen helps organizations make sense of their data. Previously, he was a data scientist at Elsevier in Amsterdam and the startups YPlan and Outbrain in New York City. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. He is the author of Data Science at the Command Line, published by O’Reilly Media. He blogs at jeroenjanssens.com and tweets as @jeroenhjanssens.

Mike Olson

Mike Olson

Board Chairman and Chief Strategy Officer, Cloudera

 

Bio

Mike Olson cofounded Cloudera in 2008 and served as its CEO until 2013, when he took on his current role of chief strategy officer. As CSO, Mike is responsible for Cloudera’s product strategy, open source leadership, engineering alignment, and direct engagement with customers. Prior to Cloudera, Mike was CEO of Sleepycat Software, makers of Berkeley DB, the open source embedded database engine. Mike spent two years at Oracle Corporation as vice president for embedded technologies after Oracle’s acquisition of Sleepycat. Prior to joining Sleepycat, Mike held technical and business positions at database vendors Britton Lee, Illustra Information Technologies, and Informix Software. Mike has a bachelor’s and a master’s degree in computer science from the University of California, Berkeley.

Shirshanka Das

Shirshanka Das

Principal Staff Software Engineer, Linkedin
Bio

Shirshanka is a Principal Staff Software Engineer and the architect for LinkedIn’s Data & Analytics team. He was among the original authors of a variety of open and closed source projects built at LinkedIn, including Databus, Espresso, and Apache Helix. He is currently working with his team on simplifying the big data analytics space at LinkedIn through a multitude of mostly open-source projects: Pinot, a high-performance distributed OLAP engine; Gobblin, a data lifecycle management platform for Hadoop; WhereHows, a data discovery and lineage platform and Dali, a data virtualization layer for Hadoop.

Wouter de Bie

Wouter de Bie

Data architect, Spotify

Changing the tires of a moving car - Moving Spotify's Data Infrastructure from on-premise to Google Cloud Platform

During the presentation we will have a look at Spotify's move of 12.000+ servers from 4 data centers to Google's Cloud Platform. Even though Spotify is still in the midsts of the migration, we already have a ton of learnings that we can share. Obviously we will look at the "why", "what" and "how" of this enormous migration.

Bio

Wouter started his career at an early age as a Linux consultant in the Netherlands during the dot-com era. In 2009 Wouter decided to move to Sweden for personal reasons and worked as a Ruby developer and system administrator at Delta Projects, one of Sweden’s biggest online ad serving companies, before he decided to join Spotify in 2011.
Wouter is currently working as a data architect at Spotify where he helps teams in building the next iteration of the big data platform and migrating from Spotify's on-premise infrastructure to Google's Cloud Platform.

Crunch Speaker

Yash Nelapati

Pinterest
Bio

Yash Nelapati is the founding engineer of Pinterest. He built the initial version of Pinterest and later scaled it to millions of users over the last 5 years. Over these 5 years journey he worked on various problems related to user growth, infrastructure scalability and product design. Outside of work he spends a lot of time shooting landscapes.

More speakers to be announced soon.


Call for Presentations

Want to be a speaker at Crunch? Submit your CFP application here.
The deadline for submissions is 11:59 p.m. (CET) on June 15th, 2016.


Workshops5 Oct, Wed

Ben Yoskovitz

Lean Analytics: How Data Drives Business Success

Ben Yoskovitz, Founding Partner, Highline BETA

In this hands-on workshop, Ben Yoskovitz will take participants through a process of first identifying what makes a good metric, and then how good metrics, their measurement and use can be applied to different types of businesses. Participants will learn the basic principles of Lean Analytics, including the Lean Analytics Stages, Lean Analytics Cycle and more. Participants will be actively involved in mapping business models, discussing what metrics matter, and how data helps drive business success.

Bio

Benjamin Yoskovitz is an entrepreneur, investor and author. He recently launched Highline BETA, a startup co-creation company. Previously he was VP Product at VarageSale and GoInstant (acq. $CRM). He’s made 15+ angel investments, and founded an accelerator, Year One Labs. Ben is the co-author of Lean Analytics (published by O’Reilly), a book that combines Lean Startups and analytics to help startups and large companies build better businesses and products faster. He’s an active blogger at http://instigatorblog.com. You can also find Ben on Twitter @byosko.

Jeroen Janssens

Crunching Data at the Command Line

Jeroen Janssens, Assistant Professor of Data Science, Tilburg University

Bio

Jeroen Janssens is an assistant professor of data science at Tilburg University. As an independent consultant and trainer, Jeroen helps organizations make sense of their data. Previously, he was a data scientist at Elsevier in Amsterdam and the startups YPlan and Outbrain in New York City. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. He is the author of Data Science at the Command Line, published by O’Reilly Media. He blogs at jeroenjanssens.com and tweets as @jeroenhjanssens.

Zoltan C. Toth

Apache Spark Essentials

(official Databricks workshop)

Zoltan C. Toth, Spark instructor, Databricks

Apache Spark Essentials will help you get productive with the core capabilities of Spark, as well as provide an overview and examples for some of Spark’s more advanced features. This full-day course features hands-on technical exercises so that you can become comfortable applying Spark to your datasets. In this class, you will get hands-on experience with ETL, exploration, and analysis using real world data.

Prerequisites:
This class doesn't require any Spark knowledge. Some experience in Python and some familiarity with big data or parallel processing concepts is helpful.

Csaba Kassai

A Big Data adventure in Google Cloud Platform

Csaba Kassai, Software Architect, Doctusoft

If you work with a huge amount of data, from either the analyst or the developer side, and you are always looking at what’s next regarding Big Data technologies, come join us and explore Google Cloud Platform’s comprehensive Big Data solution at Doctusoft’s full-day workshop.

  • Learn how to build, execute, and visualize your Big Data projects more easily and in less time with Google Cloud Platform’s four main Big Data products: BigQuery, Pub/Sub, Dataflow, and Datalab.
  • See how these products link to each other by following an example project participants work on together.
  • Learn about real business use cases and project experiences.
  • Get the full picture about how these Big Data products differ from other well-known solutions and know which one to choose to suit your business needs or the technological requirements you work with.

Whether you come from a small start-up or a big multinational company, this workshop is useful for anyone who wants to learn first-hand how to deal with a Big Data project on Google Cloud Platform.

Participants should have a technology background, a basic understanding of their current business model, and be open to sharing their thoughts and questions.

The workshop is not only a brief introduction to Google’s Big Data solutions, but it also covers several topics on Google Cloud Platform - Qualified Data Analyst (CPE201) certificate exam.

Participants will need to bring their own laptops and have a Google account. Further information about the technical environment will be communicated after registration.


Bio

Csaba has been a software architect at Doctusoft ‒ the only Google Cloud Platform partner in Hungary ‒ for 5 years. He has participated in several Big Data projects, solving the problems of different retail, telecommunication, and start-up companies using Google and Hadoop technologies. He has also worked for one of the biggest banks in Hungary on Big-Data-focused projects such as optimizing the query time of the transaction history database with ElasticSearch. Csaba’s main professional interests are Google’s Big Data products and their related programming languages and database technologies.

Location

Millenaris Park

Address: Budapest, Fény utca 22, 1024

Meet Budapest, a really awesome city

Here are a few reasons why you need to visit Budapest

Conference venue

Millenáris Park

See it on a map

The conference will take place at Millenáris, a modern cultural complex in Buda surrounded by a large park, with excellent public transport accessibility. The location is actually the reconstructed site of the one-time Ganz Electric Works, and you still can see the parts of machinery that were used here.

Need an accommodation?

Make sure to book an extra day or two and explore Budapest after the conference.


Tickets

Code of Conduct

tl;dr: Be excellent with each other

All attendees, speakers, sponsors and volunteers at our conference are required to agree with the following code of conduct. Organisers will enforce this code throughout the event. We are expecting cooperation from all participants to help ensuring a safe environment for everybody.

Need Help?

Contact us at hello@crunchconf.com.

The Quick Version

Our conference is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We do not tolerate harassment of conference participants in any form. Sexual language and imagery is not appropriate for any conference venue, including talks, workshops, parties, Twitter and other online media. Conference participants violating these rules may be sanctioned or expelled from the conference without a refund at the discretion of the conference organisers.

The Less Quick Version

Harassment includes offensive verbal comments related to gender, age, color, national origin, genetic information, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.

Participants asked to stop any harassing behavior are expected to comply immediately.

Sponsors are also subject to the anti-harassment policy. In particular, sponsors should not use sexualised images, activities, or other material. Booth staff (including volunteers) should not use sexualised clothing/uniforms/costumes, or otherwise create a sexualised environment.

If a participant engages in harassing behavior, the conference organisers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of conference staff immediately. Conference staff can be identified as they'll be wearing branded t-shirts.

Conference staff will be happy to help participants contact hotel/venue security or local law enforcement, provide escorts, or otherwise assist those experiencing harassment to feel safe for the duration of the conference. We value your attendance.

We expect participants to follow these rules at conference and workshop venues and conference-related social events.


Sponsors

Platinum

Gold

Other sponsors & media partners

CRUNCH is a non-profit conference. We are looking for sponsors who help us make this conference happen.
Take a look at our sponsor packages and contact us at hello@crunchconf.com


Contact

Crunch Conference is organized by

Ádám Boros
Ádám Boros
Marketing Intern, Prezi
Attila Balogi
Attila Balogi
Event manager, Prezi
Attila Petróczi
Attila Petróczi
Research & Data Science Project Manager, Realeyes
Balázs Szakács
Balázs Szakács
Business Intelligence Manager, Ustream
Bernadett Otterbein
Bernadett Otterbein
Sponsor manager, Ustream
Dániel Molnár
Dániel Molnár
Senior Data & Applied Scientist, Microsoft Deutschland GmbH / Wunderlist Team
Julianna Göbölös-Szabó
Julianna Göbölös-Szabó
Data Engineer, Secret Sauce Partners
Katalin Marosvölgyi
Katalin Marosvölgyi
Travel and accommodation manager, Prezi
Mihály Hazag
Mihály Hazag
Engineering Manager, Ustream
Medea Baccifava
Medea Baccifava
Head of conference management, Prezi
Tamás Imre
Tamás Imre
Lead Analyst, Prezi
Tamás Németh
Tamás Németh
Data Engineer, Prezi
Zoé Rimay
Zoé Rimay
Front-end Developer, RapidMiner
Zoltán Prekopcsák
Zoltán Prekopcsák
VP Big Data, RapidMiner
Zoltán Tóth
Zoltán Tóth
Big Data and Hadoop expert, Datapao; Teacher, CEU Business School

Questions? Drop us a line at hello@crunchconf.com

Organizing partners

Our Friends