6. GDPR Module

This module enables the data storage to meet the GDPR compliance requirements by anonymising historical data in the data warehouse. It consists of multiple components working together to anonymize data. It is designed to anonymize individual data that has been deleted or obscured in source systems. To function correctly, it requires a domain expert familiar with the tables, their meanings and purposes. All configurations are historically recorded to pass potential audit inspections.

6.1. Process

The framework’s first component is an own GDPR schema, which contains the metadata repository for configuring and controlling the anonymization. It includes information about the tables and columns to be anonymized, as well as the anonymization rules to be used.

Anonymization is carried out using rules, as they determine which values in selected columns should be replaced. It is crucial that the new value fits into the target data type column. If the data type changes over time, this is registered and stopped before a run (unless the target value is NULL).

Furthermore, the configuration also includes dedicated columns representing the business key, whose values are delivered through normal historisation.

The configuration could, for example, include tables such as “Customers,” “Accounts,” and “Transactions,” with anonymization rules possibly replacing customer IDs, bank account numbers, and other sensitive data with specific values.

The framework overwrites anonymized fields, and the original can only be restored through a data backup. Anonymization can also cause existing logical constraints to show inconsistencies.

Some benefits of the anonymizer for historical data in a DWH are:

  • Every configuration is auditable.

  • Flexible and configurable for any source.

  • Configuration checks before each run.

  • Column values are overwritten or deleted.