There are several ways to experiment with Diffix without going through the effort of setting up PostgreSQL and installing the pg_diffix extension. First and foremost, the Open Diffix project runs an online PostgreSQL server with pg_diffix and several representative databases. There are several ways to interact with our PostgreSQL service. Alternatively you can download Diffix for Desktop and try it on sample CSV datasets we provide, or on your own CSV dataset.
As always, you can contact us at firstname.lastname@example.org if you have questions or comments.
The following databases running pg_diffix are available online:
Databases with trusted_user and untrusted_user users are protected with Diffix (untrusted user mode has slightly stronger anonymity at the expense of less flexible SQL). Databases with direct_user users have no protections, and can be used to compare the raw data with Diffix' protected data.
The databases contain the following tables and data:
|banking0||The banking0 database contains a set of banking transactions and other data from a Czech bank. It has seven different tables. The transactions table for example contains over 1.2M transactions across 5300 customers.|
|taxi||The taxi database contains four hours of New York City taxi rides (from Jan. 8, 2013, 8AM to noon). It has over 95000 taxi rides driven by over 11000 drivers. It has 29 columns.|
|census0||The census0 database is taken from the US Census of 2013. This dataset is already anonymized by the US Census Bureau through sampling, aggregation, and other means. It contains 120 columns and represents 250K individuals.|
|scihub||The scihub database contains one week’s worth of downloads from the Sci-Hub scientific papers free download system. The week is the first week of September 2015. It has 15 columns, and contains over 1.1M downloads from around 160K different pseudonymized IP addresses.|
|banking||This is a simplified subset of the banking0 data.|
|moers||The moers database contains traffic violations issued in the German city of Moers. Synthetic license plate numbers have been added.|
The easiest and best way to get a quick feel for how Diffix operates is the online training app.
The training app contains examples of each of Diffix' query features. It displays anonymized and original data side-by-side to show how Diffix distorts and hides data. The training app also lets you write your own SQL queries against both the Diffix and original data.
The training app has connections to the banking0, taxi, census0, and scihub databases. It takes only a few minutes to get an immediate understanding of Diffix' basic query capabilities, and about 30 minutes to run through every example.
Metabase is a popular open source Business Intelligence (data visualization) tool. It uses SQL to access data and works with a variety of backend database technologies including PostgreSQL and pg_diffix. Metabase supports data visualization and dashboards.
We are running an online Metabase server that connects with the moers database. Login credentials are:
Metabase has two means of writing queries, through a GUI query-builder and with SQL. The query-builder translates the users selections into SQL. Because Diffix uses a restricted subset of SQL, not all of the query-builder works with the pg_diffix back end. Practically speaking, the user should expect to write SQL queries when using Metabase with Diffix.
We have prepared brief instructions on how to do this.
In principle, any SQL client that supports PostgreSQL should work with pg_diffix. In practice, each SQL client has its own idiosynchrocies in how it explores the database, and so not all SQL clients work with pg_diffix.
Any software with a PostgreSQL interface can work with pg_diffix. We have prepared a demo notebook (see here for a description) using psycopg2 and ipython-sql that you may use as a template for building your own notebook.
If you prefer to play with your own data, one quick and easy way to get started is with Diffix for Desktop, a standalone desktop for Windows, Mac, and Linux that works with a local CSV file.
Diffix for Desktop does not require SQL, but rather works with a simple GUI query-builder. It lacks all of the SQL features of pg_diffix, but is very easy to use. If you don't have your own data to play with, we offer a couple of sample CSV datasets from the download page.
We have prepared a Diffix demo in the form of a heatmap visualization of the NYC taxi data. (You can find a description here ). This demo displays the heatmap built from Diffix side-by-side with the corresponding heatmap built from the raw data. It gives a concrete impression of the power and accuracy of Diffix anonymization.