A Model for Metricising Privacy and Legal Compliance

Abstract: In order for a dataset to be legally compliant - in some sense - with privacy laws such as the General Data Protection Regulation (GDPR) various steps must be taken to ensure the removal of data that might compromise or reveal personal data. This can be achieved through a process of removal of information content or semantics; which if done incorrectly can render that dataset in violation of such laws. Machine learning presents a technology based around the analysis of dependencies and correlations of a dataset. This can be used to measure information content within the bounds of the dependencies estimators used. Utilising this we can measure the effects of anonymisation upon a dataset and the efficacy of said anonymisation functions. If we additionally characterise what anonymisation means in terms of  information loss and construct classification functions we have a framework in which the decision over whether an anonymisation is sufficient can be made. This can then be extended to an automation scenario where it becomes potentially possible that texts such as as the GDPR can be rendered as said classification functions.