Nuria Ruiz

Nuria Ruiz

Nuria Ruiz

Engineer at, Wikipedia volunteer


Nuria has been writing software for 20 years, she works currently at, a startup in the edtech space. Most recently Nuria was in charge of Data Engineering efforts at Wikipedia, where she spent a lot of time thinking about privacy conscious ways to compute metrics. She has worked in social, mobile and performance. She learned a lot in Amazon's (and AWS) early years and was part of the original team that launched Mechanical Turk. Nuria is a physicist by training and started her programming days in a Physical Oceanography Lab in Seattle. A long time ago. When Big Data was just called "science".


Wikipedia's Lean Data Diet and Lessons Learned

lean data
free knowledge

Privacy is one of the lesser known charms of Wikipedia. Wikipedia’s stand on privacy allows users to access and modify a wiki in anonymity, without fear of giving away personal information, editorship or browsing history. As of this writing, readers and editors are sending more than 2000 custom analytics events per second to the Wikipedia analytics pipeline and constantly feeding 200+ data sets. That is in addition to the 10 billion (US) web request logs that are ingested daily into the Hadoop cluster and are used to populate several important tools like public analytics APIs [1][2][3]. The long term existence of this data is key to the work the foundation does to assess product efforts but not only that, Wikipedia public data fees are used by researchers all over the world and, most importantly, community members need data to better target their edition efforts. Is it possible to retain value from these data sets when they are controlled by strict privacy policies? [1] [2] [3]