TR22/03: The Data Airlock: infrastructure for restricted data informatics

Access to operational data from outside an organisation may be prohibited for a variety of reasons. There are significant challenges when performing collaborative data science work against such restricted data.

This report describes a range of causes and risks associated with restricted data along with the social, environmental, data, and cryptographic measures that may be used to mitigate such issues. These are generally inadequate for restricted data contexts. We introduce the ’Data Airlock’, secure infrastructure that facilitates eyes-off data-science workloads. After describing our use-case, we detail the architecture and implementation of a first, single-organisation version of this infrastructure. We conclude with learnings from this implementation, and outline requirements for a second, federated version.

Tech note
TN22/03: Law Enforcement Data Interoperability (Student thesis paper)

In law enforcement (LE), interoperability, i.e., the ability to exchange information between databases and systems, enhances the ability of agencies to detect and investigate crime. A fundamental way of improving interoperability is data integration, but integrating LE databases is often difficult due to heterogeneity of database types and the semantics of the data. In this study, an ontology-based and Linked Data approach for integrating heterogeneous LE databases is proposed.

The approach is evaluated for use in an operational setting by LE data domain experts. The evaluation feedback indicates that the approach has the potential to address some of the common challenges faced when integrating heterogeneous LE databases, and could provide benefit if used in an LE agency’s operational systems.

Law Enforcement Data Interoperability

Systemic interoperability within and between law enforcement agencies is vital to address the large scale technical challenges inherent in combatting […]

Read more
GovHack 2020: Conversations with Infosys and the AiLECS LAB

Together with our partner Infosys, we sponsored a community safety problem topic at the 2020 Australian GovHack competition.  This video discusses the project and our particular takes on the application of AI for social good.


Academic publications
Criminal motivation on the dark web: A categorisation model for law enforcement

Dalins, Janis, Campbell Wilson, and Mark Carman. “Criminal motivation on the dark web: A categorisation model for law enforcement.” Digital Investigation 24 (2018): 62-71.


Research into the nature and structure of ‘Dark Webs’ such as Tor has largely focused upon manually labelling a series of crawled sites against a series of categories, sometimes using these labels as a training corpus for subsequent automated crawls. Such an approach is adequate for establishing broad taxonomies, but is of limited value for specialised tasks within the field of law enforcement. Contrastingly, existing research into illicit behaviour online has tended to focus upon particular crime types such as terrorism. A gap exists between taxonomies capable of holistic representation and those capable of detailing criminal behaviour. The absence of such a taxonomy limits interoperability between agencies, curtailing development of standardised classification tools.


We introduce the Tor-use Motivation Model (TMM), a two-dimensional classification methodology specifically designed for use within a law enforcement context. The TMM achieves greater levels of granularity by explicitly distinguishing site content from motivation, providing a richer labelling schema without introducing inefficient complexity or reliance upon overly broad categories of relevance. We demonstrate this flexibility and robustness through direct examples, showing the TMM’s ability to distinguish a range of unethical and illegal behaviour without bloating the model with unnecessary detail.


The authors of this paper received permission from the Australian government to conduct an unrestricted crawl of Tor for research purposes, including the gathering and analysis of illegal materials such as child pornography. The crawl gathered 232,792 pages from 7651 Tor virtual domains, resulting in the collation of a wide spectrum of materials, from illicit to downright banal. Existing conceptual models and their labelling schemas were tested against a small sample of gathered data, and were observed to be either overly prescriptive or vague for law enforcement purposes – particularly when used for prioritising sites of interest for further investigation.


In this paper we deploy the TMM by manually labelling a corpus of over 4000 unique Tor pages. We found a network impacted (but not dominated) by illicit commerce and money laundering, but almost completely devoid of violence and extremism. In short, criminality on this ‘dark web’ is based more upon greed and desire, rather than any particular political motivations.




Academic publications
Monte-Carlo Filesystem Search – A crawl strategy for digital forensics

Dalins, Janis, Campbell Wilson, and Mark Carman. “Monte-Carlo Filesystem Search–A crawl strategy for digital forensics.” Digital Investigation 13 (2015): 58-71.


Criminal investigations invariably involve the triage or cursory examination of relevant electronic media for evidentiary value. Legislative restrictions and operational considerations can result in investigators having minimal time and resources to establish such relevance, particularly in situations where a person is in custody and awaiting interview. Traditional uninformed search methods can be slow, and informed search techniques are very sensitive to the search heuristic’s quality. This research introduces Monte-Carlo Filesystem Search, an efficient crawl strategy designed to assist investigators by identifying known materials of interest in minimum time, particularly in bandwidth constrained environments. This is achieved by leveraging random selection with non-binary scoring to ensure robustness. The algorithm is then expanded with the integration of domain knowledge. A rigorous and extensive training and testing regime conducted using electronic media seized during investigations into online child exploitation proves the efficacy of this approach.