Academic Publications

Effective, Explainable and Ethical: AI for Law Enforcement and Community Safety

Wilson, C., Dalins, J. & Rolan, G., 2020, 2020 IEEE / ITU International Conference on Artificial Intelligence for Good, AI4G 2020. Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers, p. 186-191 6 p. 9311021

Download
PDQ & TMK + PDQF – A Test Drive of Facebook’s Perceptual Hashing Algorithms

Dalins, Janis, Campbell Wilson, and Douglas Boudry. “PDQ & TMK+ PDQF–A Test Drive of Facebook’s Perceptual Hashing Algorithms.” arXiv preprint arXiv:1912.07745 (2019)

Download
Laying foundations for effective machine learning in law enforcement. Majura – a labelling schema for child exploitation materials. 

The health impacts of repeated exposure to distressing concepts such as child exploitation materials (CEM, aka ‘child pornography’) have become a major concern to law enforcement agencies and associated entities. Existing methods for ‘flagging’ materials largely rely upon prior knowledge, whilst predictive methods are unreliable, particularly when compared with equivalent tools used for detecting ‘lawful’ pornography. In this paper we detail the design and implementation of a deep-learning based CEM classifier, leveraging existing pornography detection methods to overcome infrastructure and corpora limitations in this field. Specifically, we further existing research through direct access to numerous contemporary, real-world, annotated cases taken from Australian Federal Police holdings, demonstrating the dangers of overfitting due to the influence of individual users’ proclivities. We quantify the performance of skin tone analysis in CEM cases, showing it to be of limited use. We assess the performance of our classifier and show it to be sufficient for use in forensic triage and ‘early warning’ of CEM, but of limited efficacy for categorising against existing scales for measuring child abuse severity. We identify limitations currently faced by researchers and practitioners in this field, whose restricted access to training material is exacerbated by inconsistent and unsuitable annotation schemas.

Whilst adequate for their intended use, we show existing schemas to be unsuitable for training machine learning (ML) models, and introduce a new, flexible, objective, and tested annotation schema specifically designed for cross-jurisdictional collaborative use. This work, combined with a world-first ‘illicit data airlock’ project currently under construction, has the potential to bring a ‘ground truth’ dataset and processing facilities to researchers worldwide without compromising quality, safety, ethics and legality.

Dalins, J., Tyshetskiy, Y., Wilson, C., Carman, M. J., & Boudry, D. (2018). Laying foundations for effective machine learning in law enforcement. Majura – a labelling schema for child exploitation materials. Digital Investigation, 26, 40-54. https://doi.org/10.1016/j.diin.2018.05.004

Published: 2018

Visit Site
Criminal motivation on the dark web: A categorisation model for law enforcement
Visit Site
Monte-Carlo Filesystem Search – A crawl strategy for digital forensics
Visit Site

Technical Reports

Download

Briefing Notes

Download
BN22/02: Metior Telum – Measure the Weapon

This briefing note provides a broad introduction to the Metior Telum project for a general audience.

Download

Technical Notes

TN22/03: Law Enforcement Data Interoperability (Student thesis paper)

In law enforcement (LE), interoperability, i.e., the ability to exchange information between databases and systems, enhances the ability of agencies to detect and investigate crime. A fundamental way of improving interoperability is data integration, but integrating LE databases is often difficult due to heterogeneity of database types and the semantics of the data. In this study, an ontology-based and Linked Data approach for integrating heterogeneous LE databases is proposed.

The approach is evaluated for use in an operational setting by LE data domain experts. The evaluation feedback indicates that the approach has the potential to address some of the common challenges faced when integrating heterogeneous LE databases, and could provide benefit if used in an LE agency’s operational systems.

Download
TN22/05: Geolocation of Images (Student thesis paper)

In this paper, I propose and explore a method for image location classification. Most existing works concentrate on outdoor scenes as scenery or an iconic landmark make it easier to point out the location. Few researchers have addressed the issue of indoor scenes. Although indoor images increase the difficulty of tracking geolocation, it is necessary to respond to this shortcoming as many crimes happen indoors.

To address this problem, I propose a method for indoor image location classification by segmenting patterns of extracted objects from images. Specifically, I extract objects from images. Then, based on the accuracy levels of the bounding boxes of specific kinds of objects in the image, I only crop that kind of objects from original images. Moreover, I segment patterns from the extracted objects and crop those patterns by thresholding techniques. To classify images by these segmented patterns, I employ convolutional neural networks. Experimental results in the dataset of hotel rooms across the globe show promising accuracies, which witnesses that my method contributes to ultimately identifying the hotel chain which the image belongs to from the hotel dataset.

Download
TN22/04: Cyber Threat Intelligence (Student thesis paper)

Cyber Threat Intelligence (CTI) sharing is a way security professionals and threat analysts can freely access and share information to tackle emerging cyber threats. CTI information can be found in various textual sources such as threat reports, blog posts and online forums, however, there is an increasing centre of attention towards automatic extraction and information retrieval of CTI knowledge. In this study, we evaluate existing ontologies that have worked towards automatic CTI extraction, then we investigate the mechanisms used to extract CTI information automatically. Our contribution is in constructing a pipeline used to develop a training dataset from disparate data sources that can predict tactics and techniques based from the MITRE ATT&CK framework.

Download