Frequently Asked Questions

Table of contents

    To the top

    What is Diffix?

    Diffix is an algorithm for anonymizing structured data. It was jointly developed by Aircloak GmbH and the Max Planck Institute for Software Systems. Diffix combines the three most common anonymization mechanisms, generalization, noise, and low-count suppression. It automatically applies these mechanisms as needed on a query-by-query basis to minimize noise while ensuring strong anonymity.

    To the top

    What is Open Diffix?

    Open Diffix is a project to make Diffix anonymization free and open. The Open Diffix project develops two software products, a stand-alone desktop product, and a PostgreSQL extension. Diffix for Desktop is aimed towards ease-of-use, while Diffix for PostgreSQL targets higher complexity and scale. Both are strongly anonymous, and satisfy the GDPR definition of anonymity.

    To the top

    When will Open Diffix releases be available?

    Diffix for Desktop beta was released on Nov. 2, 2021. We expect the first full release by the end of Nov. 2021. We are targeting mid to late 2022 for the first version of Diffix for PostgreSQL. You may sign up for our newsletter to get release announcements.

    To the top

    How does Diffix compare with Differential Privacy and k-anonymity?

    K-anonymity uses generalization and low-count suppression. Systems based on Differential Privacy use noise and often use generalization. Diffix uses all three, and so combines the benefits of both k-anonymity and Differential Privacy without formerly adhering to either model. While Diffix does not offer the mathematical guarantees of Differential Privacy, it also does not have the drawback of a privacy budget.

    To the top

    What kinds of analytics does Diffix support?

    Diffix supports descriptive analytics over structured data like relational databases or CSV files: selecting columns, requesting counts or sums over those columns, putting data in bins of different sizes, and so on. Descriptive analytics is used to produce visualizations like bar graphs or scatter plots or heat maps. Diffix does not support machine learning, synthetic data generation, data masking, pseudonymization, image fuzzing, or anonymization of free-form text (redacting).

    To the top

    What about data quality?

    All anonymization mechanisms reduce data quality, by generalizing or distorting, and Diffix is no exception. The data quality of Diffix, however, usually far exceeds that of k-anonymity and Differential Privacy. Diffix for Desktop displays the amount of distortion, both as summary statistics and by displaying the original and anonymized data side-by-side. This way, you can observe Diffix' data quality for yourself.

    To the top

    Why are there both desktop and PostgreSQL extension releases?

    Descriptive analytics over structured data covers a wide range of use cases. At one extreme, a non-technical user may wish to release simple summary statistics over data from a CSV file on his or her machine. Diffix for Desktop satisfies this use case. At the other extreme, someone may wish to stream data summaries of dynamic data over millions of users into an SQL-based dashboard application. For this the Diffix for PostgreSQL is appropriate.

    To the top

    What is the trust model for users/analysts?

    Diffix has two modes of operation, Trusted Analyst Mode and Untrusted Analyst Mode. Trusted Mode protects against accidental release of personal data. Untrusted Mode protects against intentional, malicious exposure of personal data. A Trusted Mode analyst does not require any expertise in anonymization in order to safely release data queried through Diffix.

    To the top

    Why wouldn't I always use Untrusted Mode?

    Trusted Mode is easier to use. It has more query features, and in Diffix for Desktop it allows an analyst to view the anonymized and original data side-by-side. In this way the analyst knows exactly how much the data is distorted through suppression and noise, and can more easily adjust column selection and generalization as needed.

    To the top

    Is Open Diffix GDPR compliant?

    The short answer is 'yes'. The longer answer is that there are no concrete criteria for GDPR anonymity. Ultimately it is up to a Data Protection Officer (DPO) or Authority (DPA) to make the call. Diffix as implemented by Aircloak was almost always evaluated as GDPR anonymous, and the same will hold for Open Diffix releases.

    To the top

    Can the Open Diffix project help with GDPR compliance?

    The Open Diffix project will provide supporting documentation with each of its releases. The documentation will describe the Diffix algorithms in detail, along with analysis of the anonymization properties against an exhaustive set of attacks. This documentation can be used as the basis of a GDPR (or other privacy standard) evaluation by DPOs and DPAs. For assistance in this process you can contact us at hello@open-diffix.org.

    To the top

    Is Open Diffix Open Source?

    No. Open Diffix operates under the Business Source License (BSL1.1). Our license makes Diffix free for all use cases, including commercial, that do not resell Diffix software or interfaces.

    To the top

    How is Open Diffix funded?

    For the first few years, Open Diffix is funded by the Max Planck Institute for Software Systems as a research initiative. Our goal is to become self-sustaining through sponsorships, consultancy, or licensing.

    To the top

    I have a general question, who can I contact?

    Please contact us at hello@open-diffix.org.