Crunch, Data Engineering and Analytics Conference, October 29-31, 2018 Budapest
Philipp Krenn

Philipp Krenn

Infrastructure | Developer Advocate at Elastic

Bio:

Philipp lives to demo interesting technology. Having worked as a web, infrastructure, and database engineer for more than ten years, Philipp is now working as a developer advocate at Elastic — the company behind the open source Elastic Stack consisting of Elasticsearch, Kibana, Beats, and Logstash. Based in Vienna, Austria, he is constantly traveling Europe and beyond to speak and discuss about open source software, search, databases, infrastructure, and security.

Talk:

Make Your Data FABulous

Topics:
data engineering
accuracy
tradeoffs
Level:
Intermediate

The CAP theorem is widely known for distributed systems, but it’s not the only tradeoff you should be aware of. For datastores there is also the FAB theory and just like with the CAP theorem you can only pick two — will it be fast, accurate, or big and where are the tradeoffs?

What are the tradeoffs in the FAB theory?

- Fast: Results are fast enough so that people can have a seamless interaction.
- Accurate: Answers are accurate and don’t have a margin of error.
- Big: Dozens or hundreds of systems are involved in calculating the result.

Most SQL databases are in the FA space whereas Hadoop and related systems are generally AB systems. A system optimized for FB is Elasticsearch for example.
While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch can take for terms aggregations, cardinality aggregations with HyperLogLog++, and the IDF part of full-text search. Or how to trade some speed or the distribution for more accuracy.