Access to operational data from outside an organisation may be prohibited for a variety of reasons. There are significant challenges when performing collaborative data science work against such restricted data.
This report describes a range of causes and risks associated with restricted data along with the social, environmental, data, and cryptographic measures that may be used to mitigate such issues. These are generally inadequate for restricted data contexts. We introduce the ’Data Airlock’, secure infrastructure that facilitates eyes-off data-science workloads. After describing our use-case, we detail the architecture and implementation of a first, single-organisation version of this infrastructure. We conclude with learnings from this implementation, and outline requirements for a second, federated version.