Anonymisation of Court Orders

Erik Nielsen

Editorial manager Erik Nielsen at Schultz states:

The anonymisation tool provides a huge increase in efficiency as well as an important quality assurance.

Schultz provides advanced information solutions for thousands of daily users in Danish municipalities, trade unions, unemployment insurance offices and private companies.  For instance, the web portal afgoerelsesdatabasen.dk where many court orders from Denmark are publicised.

It is very important that all parties appear anonymously in the publicised documents.  I.e. names and other information that can identify people and companies etc. must be replaced in a way that do not change the meaning of the documents.

For a long time, employees at Schultz have performed anonymisations of court orders manually.  This large and very time consuming task requires a lot of concentration so that nothing is missed, and so that names etc. are replaced in the same way everywhere in a document.

Schultz contemplated using software to ease this task, and Progresso was contracted to analyse the possibilities and to design a solution.

System architecture and project management

Progresso’s analysis showed that a fully automated solution would be able to identify names of people etc. with a high degree of certainty, however, not with absolute certainty.  Since the documents contain very sensitive data, it would thus be necessary with manual checks / post editing.

Progresso therefore designed the solution with a fully automated module that performs most of the work, and a semi-automated module where anonymisations are adjusted by an editor:

Schematic for overall process design

Both of these modules have been incorporated into Schultz’ XML document pipeline from the document source (the courts) to the publication (web portals).

A strict API serves as the interface between the two modules.  The automated module creates categories and annotates each named entity with a ‘certainty level’ for the identification, which is then colour coded in the semi-automated module.  The semi-automated module presents all variables from the fully automated module in an overview pane, and several automated procedures are made available to the user.  In that way it is ensured that post editing can be performed very quickly without loosing track or missing any details.

Progresso created all design and control documents for the project and performed the project management.

Fully automated module

Software stack
  1. Perl   and
    Bracmat
  2. Windows Server

Center for Language Technology (CST) has produced the fully automated module according to Progresso’s specifications.

The module is based on linguistic research, and the developers at CST have seen to it that the module fits the language used in court orders.  Furthermore, CST has programmed the module so that it complies with Schultz’ guidelines for anonymisation.

The module finds most of the elements that should be anonymised.  It categorises these and consistently suggests substitutions such as person1, person2, company1, city1, account-number1 etc. for the same names, companies, cities and account numbers etc. throughout the entire XML document.

Computer aided post editing – semi-automated module

Progresso has produced the semi-automated module that can be used autonomously or in connection with the fully automated module.

It consists of a graphical user interface that displays and controls all anonymisations, from the fully automated module, for instance.  All changes are checked and numbered automatically, and automated overview lists assist the user in keeping track of the details in each XML document.

Software stack
  1. DHTML and ActiveX
  2. XMetaL Author
  3. Windows
Schultz requested a solution based on XMetaL Author.

The suggested anonymisations are displayed with colour coding according the estimated precision of the automatic identification. This assists the user with where to focus the attention.

A range of features are available for the user to search, replace, edit and renumber the anonymisations one-by-one or a multiple at once.

This module too follows Schultz’ guidelines for anonymisation rigorously, and all substitutions comply with the principle of categorising and numbering, like person1, person2, company1, city1, account-number1 etc.

See also scientific paper based on the project.