Resource centre - AiLECS Lab

We are producing a wide range of material including academic publications, technical notes, reports, white-papers, videos, and code.

Topic

Artificial intelligence Collaboration Community Safety Countering child exploitation Countering misinformation Data management Data science Digital forensics Ethical use of data Explainable AI Image classification Infrastructure Interdisciplinary research Law Enforcement Machine learning Masters project Perceptual hashing Student project Clear

Category

Countering online child exploitation Curating ethical datasets Detecting illegal firearms Interdisciplinary AI research Managing sensitive data Uncategorized Clear

Type

Academic publications Briefing Note Report Tech note Video Clear

5 results

Academic publications

Effective, Explainable and Ethical: AI for Law Enforcement and Community Safety

Wilson, C., Dalins, J. & Rolan, G., 2020, 2020 IEEE / ITU International Conference on Artificial Intelligence for Good, AI4G 2020. Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers, p. 186-191 6 p. 9311021

We describe the Artiﬁcial Intelligence for Law Enforcement and Community Safety (AiLECS) research laboratory, a collaboration between the Australian Federal Police and Monash University. The laboratory was initially motivated by work towards countering online child exploitation material. It now offers a platform for further research and development in AI that will beneﬁt policing and mitigating threats to community wellbeing more broadly. We outline the work the laboratory has undertaken, results to date, and discuss our agenda for scaling up its work into the future.

2020

Visit

Academic publications

PDQ & TMK + PDQF – A Test Drive of Facebook’s Perceptual Hashing Algorithms

Dalins, Janis, Campbell Wilson, and Douglas Boudry. “PDQ & TMK+ PDQF–A Test Drive of Facebook’s Perceptual Hashing Algorithms.” arXiv preprint arXiv:1912.07745 (2019).

Efficient and reliable automated detection of modified image and multimedia files has long been a challenge for law enforcement, compounded by the harm caused by repeated exposure to psychologically harmful materials. In August 2019 Facebook open-sourced their PDQ and TMK + PDQF algorithms for image and video similarity measurement, respectively. In this report, we review the algorithms’ performance on detecting commonly encountered transformations on real-world case data, sourced from contemporary investigations. We also provide a reference implementation to demonstrate the potential application and integration of such algorithms within existing law enforcement systems.

https://arxiv-org.ezproxy.lib.monash.edu.au/pdf/1912.07745.pdf

Download

Academic publications

Laying foundations for effective machine learning in law enforcement. Majura – a labelling schema for child exploitation materials.

Dalins, J., Tyshetskiy, Y., Wilson, C., Carman, M. J., & Boudry, D. (2018). Laying foundations for effective machine learning in law enforcement. Majura – a labelling schema for child exploitation materials. Digital Investigation, 26, 40-54. https://doi.org/10.1016/j.diin.2018.05.004

The health impacts of repeated exposure to distressing concepts such as child exploitation materials (CEM, aka ‘child pornography’) have become a major concern to law enforcement agencies and associated entities. Existing methods for ‘ﬂagging’ materials largely rely upon prior knowledge, whilst predictive methods are unreliable, particularly when compared with equivalent tools used for detecting ‘lawful’ pornography. In this paper we detail the design and implementation of a deep-learning based CEM classiﬁer, leveraging existing pornography detection methods to overcome infrastructure and corpora limitations in this ﬁeld. Speciﬁcally, we further existing research through direct access to numerous contemporary, real-world, annotated cases taken from Australian Federal Police holdings, demonstrating the dangers of overﬁtting due to the inﬂuence of individual users’ proclivities. We quantify the performance of skin tone analysis in CEM cases, showing it to be of limited use. We assess the performance of our classiﬁer and show it to be sufﬁcient for use in forensic triage and ‘early warning’ of CEM, but of limited efﬁcacy for categorising against existing scales for measuring child abuse severity. We identify limitations currently faced by researchers and practitioners in this ﬁeld, whose restricted access to training material is exacerbated by inconsistent and unsuitable annotation schemas.

Whilst adequate for their intended use, we show existing schemas to be unsuitable for training machine learning (ML) models, and introduce a new, ﬂexible, objective, and tested annotation schema speciﬁcally designed for cross-jurisdictional collaborative use. This work, combined with a world-ﬁrst ‘illicit data airlock’ project currently under construction, has the potential to bring a ‘ground truth’ dataset and processing facilities to researchers worldwide without compromising quality, safety, ethics and legality.

Visit

Academic publications

Criminal motivation on the dark web: A categorisation model for law enforcement

Dalins, Janis, Campbell Wilson, and Mark Carman. “Criminal motivation on the dark web: A categorisation model for law enforcement.” Digital Investigation 24 (2018): 62-71.

Research into the nature and structure of ‘Dark Webs’ such as Tor has largely focused upon manually labelling a series of crawled sites against a series of categories, sometimes using these labels as a training corpus for subsequent automated crawls. Such an approach is adequate for establishing broad taxonomies, but is of limited value for specialised tasks within the field of law enforcement. Contrastingly, existing research into illicit behaviour online has tended to focus upon particular crime types such as terrorism. A gap exists between taxonomies capable of holistic representation and those capable of detailing criminal behaviour. The absence of such a taxonomy limits interoperability between agencies, curtailing development of standardised classification tools.

We introduce the Tor-use Motivation Model (TMM), a two-dimensional classification methodology specifically designed for use within a law enforcement context. The TMM achieves greater levels of granularity by explicitly distinguishing site content from motivation, providing a richer labelling schema without introducing inefficient complexity or reliance upon overly broad categories of relevance. We demonstrate this flexibility and robustness through direct examples, showing the TMM’s ability to distinguish a range of unethical and illegal behaviour without bloating the model with unnecessary detail.

The authors of this paper received permission from the Australian government to conduct an unrestricted crawl of Tor for research purposes, including the gathering and analysis of illegal materials such as child pornography. The crawl gathered 232,792 pages from 7651 Tor virtual domains, resulting in the collation of a wide spectrum of materials, from illicit to downright banal. Existing conceptual models and their labelling schemas were tested against a small sample of gathered data, and were observed to be either overly prescriptive or vague for law enforcement purposes – particularly when used for prioritising sites of interest for further investigation.

In this paper we deploy the TMM by manually labelling a corpus of over 4000 unique Tor pages. We found a network impacted (but not dominated) by illicit commerce and money laundering, but almost completely devoid of violence and extremism. In short, criminality on this ‘dark web’ is based more upon greed and desire, rather than any particular political motivations.

Visit

Academic publications

Monte-Carlo Filesystem Search – A crawl strategy for digital forensics

Dalins, Janis, Campbell Wilson, and Mark Carman. “Monte-Carlo Filesystem Search–A crawl strategy for digital forensics.” Digital Investigation 13 (2015): 58-71.

Criminal investigations invariably involve the triage or cursory examination of relevant electronic media for evidentiary value. Legislative restrictions and operational considerations can result in investigators having minimal time and resources to establish such relevance, particularly in situations where a person is in custody and awaiting interview. Traditional uninformed search methods can be slow, and informed search techniques are very sensitive to the search heuristic’s quality. This research introduces Monte-Carlo Filesystem Search, an efﬁcient crawl strategy designed to assist investigators by identifying known materials of interest in minimum time, particularly in bandwidth constrained environments. This is achieved by leveraging random selection with non-binary scoring to ensure robustness. The algorithm is then expanded with the integration of domain knowledge. A rigorous and extensive training and testing regime conducted using electronic media seized during investigations into online child exploitation proves the efﬁcacy of this approach.

Visit