This Marie Curie career integration project considered the whole data release process, from the data owner to the data user. It laid out a set of principles for privacy tool design that highlight the requirements for interoperability, extensibility and scalability. The aim of the project was in Delivering Anonymization Practically, Privately, Effectively and Reusably (DAPPER). It produced published results under the following four themes:
- Synthetic Private Data. New methods were developed for providing synthetic data in the form of (social) networks, based on anonymized versions of real data under the strong privacy guarantee of differential privacy. The fellow also proposed a new privacy definition called personalized differential privacy (PDP), a generalization of differential privacy in which users specify a personal privacy requirement for their data, and introduced several novel mechanisms for achieving PDP.
- Correlated Data Modelling. Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. The fellow provided a set of algorithms for materializing marginal statistics under the strong model of local differential privacy, as well as developing PrivBayes, a differentially private method for releasing high-dimensional data.
- Data Utility Enhancement. The fellow worked on the core problem of count queries, and designed randomized mechanisms to release count data associated with a group of individuals. The fellow also gave new algorithms to provide statistical information about graphs based on a ‘ladder’ distribution.
- Trajectory Data. The fellow presented DPT, a system to synthesize mobility data based on raw GPS trajectories of individuals while ensuring strong privacy protection in the form of e-differential privacy.