Cool idea. It's like decentralizing artificial intelligence by having volunteers act as separate neurons in a deep learning model. Doing it this way allows a privacy-preserving computation. https://t.co/GGNwcUV3rP
This makes no sense--it's wrong on several levels, but the most obvious one is sufficient reason to dismiss the idea.
He's suggesting using a convolutional neural network because he's treating the problem like it's an image classification task--a common-enough application of CNNs. You train the network on a bunch of examples, then give it images it's never seen before and ask it to classify them. But that's not actually the relevant problem in this domain! Screening for child porn is a matter of trying to determine whether a given image is a member of a known set of extant images.
You can't treat it like an image classification problem because even experts can't tell a person's age just by looking at a picture of them--you need information that's not in the image itself, namely the subject's age at the time the picture was taken. Otherwise, it might just be an adult with a young-looking face/body. This famously happened in a case involving Lupe Fuentes--a man was arrested for child porn possession, and an expert testified at trial that the actress in the video was definitely underage...but then the porn star herself flew in from Spain and testified that in fact it was shot when she was of legal age.
So, he's proposing using a tool that doesn't work well for the actual problem domain, and then crippling it further by chopping the image up and splitting it amongst the nodes. The false positive rate would be astronomical.
There are other problems with the proposal as well. Take this breathtakingly dishonest claim, for example:
Just as we cannot read a person’s thoughts by looking at a brain scan, we do not know how ANNs make the decisions they make.
That's just not true. Interpretability is certainly a problem for machine learning, but it's not an insoluble one; there's a ton of research out there on how you do it.
Or how about the fact that the "input layer" feeds unencrypted image chunks into the first layer of the CNN? The coordinator of the CNN can then trivially reconstruct the input image.
But let's just assume for a sec that his diagram is wrong, and the actual intent is to do feature extraction on independent nodes, and only pass on vectors of confidence scores about extracted features to the coordinated part of the network. You can't actually have it both ways: the "black box" layer would necessarily have to either be theoretically able to reconstruct the image (in which case it's not a black box), or it would have to just operate on confidence signals about a vector of extracted features that can't be reconstructed into the image, and so the CNN isn't going to be able to add any insight. Information theory's a bitch.
Finally, the performance characteristics of a system like this for the independent nodes would be an absolute nightmare.
TL;DR: Neural networks aren't magic; you can't just wave your hands and say "Oh, that hard bit? I'm pretty sure the ANN will take care of it."
Source? The latest research I could find had 97.52% TPR at 1% FPR as the state-of-the-art just for detecting the difference between pornographic and non-pornographic images. The lowest tested FPR value on the SOC graph was only 0.1%, and that had a much, much lower TPR. Either one would be many, many orders of magnitude too weak to be useful, and that's just for detecting porn. Throw in age detection, a notoriously noisy problem, and it's impossible.
Bear in mind, when detecting rare events, you need to have an insanely low FPR for the system to be usable. Let's say I want to have a system where at least half of the images it detects as kiddie porn are actually kiddie porn. What FPR would I need? Well, first let's guess how many images on an average messaging platform are child porn? Maybe one in 10 million? It's probably more like 1 in a billion, but let's start with a conservative assumption. You'd need an FPR of 0.0000001% at TPR 100% just to get 50% overall accuracy. Even the best image classifiers, on the easiest tasks, have something like 500,000 times the error rate you'd need. And this system is being proposed for a problem that's literally impossible to solve with 100% accuracy *even for humans*.
But let's go a little bit farther. Remember that the goal of this system is to give an indicator of kiddie porn while preserving user privacy--that is, the system can't be able to actually reconstruct the image. That means that when the system tells you something's kiddie porn, you can't just view the image to verify whether it's a false positive or not: you've got to go get a warrant to search the sender's phone and find the actual evidence. Are you really gonna be able to get a warrant based on a system that catches innocent people half the time? What about a system where innocent people are flagged 99.999999% of the time?
Why do that? If someone wants to send illegal content he'll just use some other encrypted messaging app which doesn't have the content checks. Having parts of your message sent to some random volunteers isn't privacy.
What will the government do if there's no service provider? It'd have to forbid the use of encrypted protocols like SSH or HTTPS, which would make everything on the internet insecure or issue licenses to the services that require it. I'm not saying that is not possible, but it's definitely a crazy scenario and I don't think that government would allow for any privacy with that level of craziness.