Differential privacy made easy… and open-source!
Tumult Analytics, a Python library making it easy and safe to use differential privacy, is now open-source!
Today, we’re excited to share what we’ve been working on for a few years: Tumult Analytics, a Python library making it easy and safe to use differential privacy, is now open-source!
This is the same library that is running in production at institutions like the U.S. Census Bureau, Wikimedia, and the Internal Revenue Service. It helps such organizations publish sensitive data with strong, provable privacy guarantees. If this sounds like something you could use, don’t hesitate to give it a try, or to join our Slack and chat with us about your use case!
What does Tumult Analytics do?
Tumult Analytics enables organizations to safely release statistical summaries of sensitive data.
It’s difficult to publish statistics in a way that keeps individual information private. Ad hoc de-identification techniques are not safe, and reconstruction attacks can retrieve original records from aggregated statistics with surprising efficacy. But the need to share data in a privacy-safe way has not gone away!
Differential privacy provides a solution to this problem. This notion has a number of unique benefits. Its guarantees are mathematically proven, and quantifiable: organizations can measure the privacy impact of each data release, and evaluate the overall risk across multiple data publications.
Differential privacy is widely considered to be the gold standard for privacy-preserving data publication. Yet, so far, using it in practice has been challenging, especially for non-experts. This is where Tumult Analytics comes in: it encapsulates years of scientific research and makes it available to any data scientist or engineer via a familiar, easy-to-use software library.
What makes Tumult Analytics special?
We built Tumult Analytics with four core design principles in mind.
- Ease of use. Our goal is to make it easy for everyone to deploy differential privacy. To this end, we developed a Python interface that will feel familiar to anyone with prior experience with data analysis with tools such as pandas or PySpark. We invested time and effort into writing delightful documentation and tutorials to help new users use our software, even without having prior experience with differential privacy.
- Scalability and performance. All individual components are built on top of Spark, which allows Tumult Analytics to painlessly scale to datasets with billions of rows. We work hard to keep its performance comparable to using Spark directly.
- Safety. Your differential privacy guarantee is only as strong as the software that implements it. We prioritize simplicity and auditability, invest in privacy vulnerability research, and develop provably safe alternatives to avoid common weaknesses and ensure our users get the protection they need.
- Feature-richness: The underlying privacy framework already supports complex features that our users need, such as private and public joins, advanced privacy accounting, and adaptive mechanisms. Thanks to its extensible architecture, it’s easy for us to add more functionality over time.
We think that the combination of these strengths is unique among existing software libraries for differential privacy. Tumult Analytics is already ideally positioned to address real-world use cases, and we can’t wait to iterate and improve it even more.
Where do we go from there?
This depends on you! If you give Tumult Analytics a try, join our Slack server: we’d love to hear your questions, feedback, or feature requests.
Who are we?
We’re Tumult Labs, a startup whose mission is to make it easy for any organization to share and publish sensitive data using provably safe privacy technology. Our team is composed of differential privacy experts with decades of cumulative experience deploying this technology to solve real world use cases. Our clients include the U.S. Census Bureau, Wikimedia, and the Internal Revenue Service. For more information, consult our website.