Play with Diffix

Table of contents

    To the top

    Introduction

    There are several ways to experiment with Diffix without going through the effort of setting up PostgreSQL and installing the pg_diffix extension. First and foremost, the Open Diffix project runs an online PostgreSQL server with pg_diffix and several representative databases. There are several ways to interact with our PostgreSQL service. Alternatively you can download Diffix for Desktop and try it on sample CSV datasets we provide, or on your own CSV dataset.

    As always, you can contact us at hello@open-diffix.org if you have questions or comments.

    To the top

    Documentation

    How to use Diffix

    • Analyst Guide (for SQL usage of pg_diffix, the PostgreSQL Diffix extension).
    • Online instructions for Diffix for Desktop operation (also available in the application itself).
    • The online Training App has built-in usage documentation.

    How to configure Diffix

    How Diffix works

    To the top

    Servers and Datasets

    The following databases running pg_diffix are available online:

    Databases with trusted_user and untrusted_user users are protected with Diffix (untrusted user mode has slightly stronger anonymity at the expense of less flexible SQL). Databases with direct_user users have no protections, and can be used to compare the raw data with Diffix' protected data.

    The databases contain the following tables and data:

    To the top

    Online Training App

    The easiest and best way to get a quick feel for how Diffix operates is the online training app.

    The training app contains examples of each of Diffix' query features. It displays anonymized and original data side-by-side to show how Diffix distorts and hides data. The training app also lets you write your own SQL queries against both the Diffix and original data.

    The training app has connections to the banking0, taxi, census0, and scihub databases. It takes only a few minutes to get an immediate understanding of Diffix' basic query capabilities, and about 30 minutes to run through every example.

    To the top

    Metabase

    Metabase is a popular open source Business Intelligence (data visualization) tool. It uses SQL to access data and works with a variety of backend database technologies including PostgreSQL and pg_diffix. Metabase supports data visualization and dashboards.

    We are running an online Metabase server that connects with the moers database. Login credentials are:

    Metabase has two means of writing queries, through a GUI query-builder and with SQL. The query-builder translates the users selections into SQL. Because Diffix uses a restricted subset of SQL, not all of the query-builder works with the pg_diffix back end. Practically speaking, the user should expect to write SQL queries when using Metabase with Diffix.

    We have prepared brief instructions on how to do this.

    To the top

    SQL Clients

    In principle, any SQL client that supports PostgreSQL should work with pg_diffix. In practice, each SQL client has its own idiosynchrocies in how it explores the database, and so not all SQL clients work with pg_diffix.

    We have successfully used two SQL clients with pg_diffix, pgAdmin and DBeaver

    To the top

    Software APIs and Notebooks

    Any software with a PostgreSQL interface can work with pg_diffix. We have prepared a demo notebook (see here for a description) using psycopg2 and ipython-sql that you may use as a template for building your own notebook.

    To the top

    Diffix for Desktop

    If you prefer to play with your own data, one quick and easy way to get started is with Diffix for Desktop, a standalone desktop for Windows, Mac, and Linux that works with a local CSV file.

    Diffix for Desktop does not require SQL, but rather works with a simple GUI query-builder. It lacks all of the SQL features of pg_diffix, but is very easy to use. If you don't have your own data to play with, we offer a couple of sample CSV datasets from the download page.

    To the top

    A visual comparison

    We have prepared a Diffix demo in the form of a heatmap visualization of the NYC taxi data. (You can find a description here ). This demo displays the heatmap built from Diffix side-by-side with the corresponding heatmap built from the raw data. It gives a concrete impression of the power and accuracy of Diffix anonymization.