Machine learning does all sorts of things for us, from the mundane to the extraordinary. But where does the underlying data that those algorithms and AI models train on come from?
Today’s blog post signposts the official launch of AiLECS lab’s My Pictures Matter campaign: a world first crowdsourcing initiative to create an ethically sourced dataset of ‘safe’ childhood photos for machine learning research to counter child exploitation.
Sounds impressive, right? But… What does this actually mean, and why is it so important? I’ll outline what – the substance of the My Pictures Matter campaign – first, before coming back to the why to close out this post.
My Pictures Matter – what’s the campaign pitch?
If you are 18 or older, you can make an active difference in the fight against child abuse by visiting the My Pictures Matter website and contributing one or more childhood photos of yourself (at ages 0 to 17 years). We need 100 000 photos of children for a dataset that will be used to develop and improve technologies to counter online child sexual exploitation.
What kind of photos?
School photos, selfies, milestone celebrations, everyday moments from your childhood. ‘Safe’ photos only please! (We are not collecting images that show any form of child nudity, illegal activity, or depictions of violence or abuse).
What will the photos be used for?
The photos will be used to train algorithms to recognise ‘safe’ images of children, and we’ll also be researching how those technologies can be applied to AI that makes assessments on whether digital files contain ‘unsafe’ imagery of children. Images will not be made public.
What’s an ethically sourced dataset?
Ethically sourced means consent-focused: people in the photos know about and agree to their images being included in the dataset and used for research.
Asking adults to contribute photos from their own childhood is our response to the problem of how to facilitate informed and meaningful consent for use of images of children (especially very young children). Instead of seeking proxy consent from parents or guardians to use pictures of individuals who are currently children, we’re providing a simple way for people who are now adults to share their own childhood photos.
Why do My Pictures Matter?
Your pictures matter because, as part of a dataset, they can help others. Developing computer models that can detect child exploitation material with reasonable accuracy is complex and technically difficult. But without quality data, we cannot improve them. Contributing photos to the research is a concrete step you can take to advance the efforts of law enforcement in safeguarding individuals who are experiencing, or who are survivors of, childhood sexual abuse and online exploitation.
Your pictures also matter because they represent part of your identity. If you choose to share photos with us, we will respect your choices regarding how you want your visual likeness to be treated, and recognise that consent is not an all or nothing proposition. Providing consent for one use does not mean you agree to your image being used for any other purposes. You are also free to withdraw consent if you no longer want your photo(s) used in the research.
Why do we care?
There’s a longstanding metaphor that posits “data is the new oil” – language that has been upgraded over time to position digital technologies as the “fourth industrial revolution”. Knowing the impacts fossil fuel use and industrial revolutions have had on sustainability and environment, and for the populations of people on whose labour such revolutions turn; such comparisons should give us pause.
The My Pictures Matter initiative is an example of how AiLECS is putting ethical principles into practice at both local and global scale. Creating a consentful dataset not only has context-specific benefits for our research (strengthening the veracity of data instances with regard to age estimation, for example) but is also an opportunity to generate impact on bigger research cultures. A successful campaign will demonstrate that creating large machine learning datasets with a requirement to maintain meaningful human consent at-scale is not an insurmountable obstacle, but is a valid choice and a viable alternative to more extractive approaches to data acquisition.
If you’ve been following the blog, or the activities of the lab, you’ll already know that ethics is a core focus for AiLECS. Not just articulating ethical principles and frameworks (although that’s important too!), but applying these to research and technology cultures in real terms: both in our own research, and extending outward to have wider impact. Being accountable to people whose data underpins technology – which is to say, without whom no results would be possible – pushes back against the proposition that such data is an unencumbered asset open for the taking. In line with broader trends in data ethics, we want to challenge the (unfortunately) common view that if it’s online, it can be used arbitrarily for AI.
Being purposeful and creating technology ‘for good’ is one pillar of developing ethical AI for law enforcement and community safety; another is that such technologies are working to their intended effect, while minimising any adverse consequences. Respecting the agency of people whose lives are represented in data that fuels our work is a third pillar to support ethical AI.
Author: Nina Lewis
June 2022