Frequently Asked Questions

Table of contents

    To the top

    What is Diffix?

    Diffix is a bundled set of mechanisms for anonymizing structured data. It was jointly developed by Aircloak GmbH and the Max Planck Institute for Software Systems. Diffix exploits mechansims that have been in use by national statistics offices for decades: aggregation, generalization, noise, suppression, and swapping. It automatically applies these mechanisms as needed on a query-by-query basis to minimize noise while ensuring strong anonymity. Here is a brief overview.

    To the top

    What is Open Diffix?

    Open Diffix is a project to make Diffix anonymization free and open. The Open Diffix project develops two Diffix query engine implementations, one based on .NET and the other a PostgreSQL extension called pg_diffix. As a PostgreSQL extension, pg_diffix offers the same benefits of PostgreSQL: scale, performance, deployment, and access control features. Open Diffix can also be run as a stand-alone desktop application, Diffix Dashboards. The .NET implementation serves primarily as a reference implementation, but is also used to support the legacy desktop application Diffix for Desktop. Both implementations are strongly anonymous, and satisfy the GDPR definition of anonymity.

    To the top

    What is Diffix Fir?

    Major versions of Diffix are named after trees. Diffix Aspen through Dogwood were developed by Aircloak GmbH. Diffix Elm was the first version developed by the Open Diffix project. Compared to earlier versions, Diffix Elm represented a kind of "complexity reset". It is much simpler, easier to use, and easier to analyze (though less feature rich).

    The latest version is Diffix Fir, which adds several new features including sum, average, and simple WHERE expressions.

    To the top

    Where can I learn about Diffix Fir?

    A full specification and privacy analysis of Diffix Fir is not yet complete. The full specification and privacy analysis of Diffix Elm is available on ArXiv. It includes guidance for writing a risk assessment. Since Fir adds only a few new features to Elm, the Elm specification suffices for now.

    A good overview of Diffix Elm can be found here , with the additional features of Diffix Fir described here.

    To the top

    How is Diffix Fir deployed?

    Open Diffix supports three implementations of Diffix Fir, a PostgreSQL extension (pg_diffix), and two stand-alone desktop applications.

    Diffix for PostgreSQL provides all the benefits of PostgreSQL, allowing development of scalable web back-ends, dashboards, and applications over a standard API with SQL, as well as the use of SQL clients.

    Diffix Dashboards is a stand-alone Windows desktop application with data visualization features designed to work with CSV files. It bundles pg_diffix with the open-source Business Intelligence tool Metabase, and offers both GUI-based query building and SQL.

    Diffix for Desktop is built on a .NET implementation of Diffix Fir. It is designed for extreme ease of installation and use. It supports CSV tables and a simple GUI (no SQL required). It is a legacy application and will probably no longer be supported in lieu of Diffix Dashboards.

    To the top

    How does Diffix compare with Differential Privacy and k-anonymity?

    K-anonymity uses generalization and suppression. Systems based on Differential Privacy use noise and often use generalization. Diffix uses all three, and so combines the benefits of both k-anonymity and Differential Privacy without formally adhering to either model. In so doing, Diffix is more patterned after how national statistics offices approach anonymization. While Diffix does not offer the mathematical guarantees of low-epsilon Differential Privacy, it also does not have the drawback of a privacy budget.

    To the top

    What kinds of analytics does Diffix Fir support (and not support)?

    Diffix supports descriptive analytics over structured data like relational databases or CSV files: selecting columns, requesting counts or sums over those columns, putting data in bins of different sizes, and so on. Descriptive analytics is used to produce visualizations like bar graphs or scatter plots or heat maps. Diffix does not support machine learning, synthetic data generation, data masking, pseudonymization, image fuzzing, or anonymization of free-form text (redacting).

    To the top

    How much SQL does Diffix Fir support?

    Diffix Fir supports a very limited but useful subset of SQL. It supports numeric, text, and datetime data types. It lets you build multi-column histograms of counts, sums, and averages. It supports basic generalization functions (e.g. rounding of numeric columns, and substring selection of text columns). It supports JOIN and WHERE with AND logic.

    To the top

    What about data quality?

    All anonymization mechanisms reduce data quality, by generalizing or distorting, and Diffix is no exception. The data quality of Diffix is similar to data released by many national statistics offices (e.g. census data), and usually far exceeds that of k-anonymity and Differential Privacy.

    Diffix Dashboards allows the side-by-side comparison of anonymized and non-anonymized data. This way, you can observe Diffix' data quality for yourself. Diffix for PostgreSQL can display the magnitude of noise added to each output bin.

    To the top

    What is the trust model for users/analysts?

    Diffix has two modes of operation, Trusted Analyst Mode and Untrusted Analyst Mode. Trusted Mode protects against accidental release of personal data. Untrusted Mode protects against intentional, malicious exposure of personal data. A Trusted Mode analyst does not require any expertise in anonymization in order to safely release data queried through Diffix.

    To the top

    Why wouldn't I always use Untrusted Mode?

    Trusted Mode is easier to use. It has more query features, and in Diffix Dashboards it allows an analyst to view the anonymized and original data side-by-side. In this way the analyst knows exactly how much the data is distorted through suppression and noise, and can more easily adjust column selection and generalization as needed.

    To the top

    Is Open Diffix GDPR compliant?

    The short answer is 'yes'. The longer answer is that there are no concrete criteria for GDPR anonymity. Ultimately it is up to a Data Protection Officer (DPO) or Authority (DPA) to make the call. Diffix as implemented by Aircloak was almost always evaluated as GDPR anonymous, and the same holds for Open Diffix releases.

    To the top

    Can the Open Diffix project help us with GDPR compliance?

    The full specification of Diffix Elm is designed to support risk assessment by DPOs and DPAs for GDPR or any other privacy standard. It describes the anonymization mechanisms in detail, and gives an analysis of the anonymization properties against an exhaustive set of attacks. For assistance in this process you can contact us at hello@open-diffix.org.

    To the top

    Is Open Diffix Open Source?

    Almost. Open Diffix operates under the Business Source License (BSL1.1). Our license makes Diffix free for all use cases, including commercial, that do not resell Diffix software or interfaces.

    To the top

    How is Open Diffix funded?

    For the first few years, Open Diffix is funded by the Max Planck Institute for Software Systems as a research initiative. Our goal is to become self-sustaining through sponsorships, consultancy, or licensing.

    To the top

    I have other questions. Who can I contact?

    Please contact us at hello@open-diffix.org.