Benchmarking differentially private synthetic data generation algorithms

Which synthetic data generation algorithms for tabular datasets offer the best privacy/utility trade-offs? Tumult Labs did the research. Read the results below.

Research

Michael Hay

Summary: ‍

A synthetic dataset consists of a collection of records which are generated to match the properties of an original data source. It is an appealing option for a range of data sharing settings and differential privacy is the best way to guarantee that synthetically generated data protects the privacy of the original source data.

In this paper, we benchmark twelve published methods for generating differentially private tabular synthetic data to see which most accurately preserve the properties of the source data. We present a systematic benchmark where the utility of the synthetic data is evaluated by measuring whether it preserves the distribution of individual and pairs of attributes, pairwise correlation, as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluation we identify the top performing algorithms and those that consistently fail to beat baseline approaches.

‍

Read paper

other Research articles

View All

Research

Benchmarking differentially private synthetic data generation algorithms

Which synthetic data generation algorithms for tabular datasets offer the best privacy/utility trade-offs? Tumult Labs did the research. Read the results below.

Research

PrivateSQL: Reimagining and designing a new differentially private SQL query engine

Read the paper co-authored by Tumult Labs founders on building a differentially private relational database system that takes into account the complexity of multi-relational schemas and constraints.

Research

Differentially private algorithms for detailed race and ethnicity in the 2020 census

Tumult Labs designed a novel differentially-private algorithm that the U.S. Census Bureau is using to publish the Detailed Demographic and Housing Characteristics (DHC) Race & Ethnicity tabulations, as part of the 2020 Census.

Research

AIMing Higher: A Smarter Approach to Privacy-Preserving Synthetic Data

Learn how the AIM algorithm, co-invented by Tumult Labs CEO Gerome Miklau, improves upon existing algorithms for synthetic data generation by adapting to the user’s analysis needs and capturing key patterns in the input data.

Research

A Winning Approach to Generating Synthetic Data

A scientific paper, co-authored by our CEO Gerome Miklau, introduces a cutting-edge method for generating differentially private synthetic data.

Research

Evaluating the usability of differential privacy tools with data practitioners

Researchers at University of Vermont ran a usability study to compare various differential privacy tools. Can you guess which platform study participants found easiest to use correctly?

Unleash the power and value of your data.

Request a Demo