György Móra is principal data scientist at Ekata (formerly Whitepages Pro) where he works on the machine learning solutions powering Ekata’s Identity Check Confidence Score and Transaction Risk Score. He has a background in software engineering and natural language processing research. His main interest in ML is how to deliver value to users.
At Ekata we evaluate identity information in online shopping transactions, to enable our customers (vendors) to protect against fraud. We make commitments to deliver pre-purchase verification predictions with very low latency.
When we built the ML system powering the Identity Check Confidence Score we faced the following challenges:
In this talk I discuss how we met these challenges by building a Spark- and XGBoost-based training and experimentation pipeline that delivers our models as a custom predictor library, to allow seamless integration with the production environment. Our embeddable library enables super-fast predictions real-time.
There are existing formats for exporting and storing ML models in order to load them into the production system to deliver predictions. But, in these existing formats, feature extraction and normalization usually needs to be reimplemented in the production systems, and it is not part of the model description. Our solution, which encapsulates feature extraction, normalization and prediction into one unit, gives the data science team a lot more flexibility to make changes and allows easy integration for engineering teams.