AI and problems of scale — Benedict Evans
There’s a story in one of Georges Simeon’s 1930s detective stories that I think about sometimes when talking about a certain kind of AI problem. Simenon’s hero, Inspector Maigret of the judicial police in Paris, scares a witness, and then goes across the street to a café and calls the telephone exchange. He tells them that someone will place a call from the ‘Pelican’ nightclub to Cannes: they are to hold the call until he gets there. Then he takes a taxi to the exchange, where they are indeed holding the call, and listens in.
I told this story to someone at a Three Letter Agency a few years ago, and got a wry smile – they can’t really do that now, but there are other things that they could do, and they could do them at the scale not of one phone call but of millions. That seems different. We accept the police listening to phone calls one at a time, with a warrant, but not listening to all of them, all of the time.
Something similar comes up when we talk about AI and face recognition by law enforcement today. We’re all (I think) comfortable with the idea of ‘Wanted’ posters. We understand that the police put them up in their offices, and maybe have some on the dashboard of their patrol car. In parallel, we have a pretty wide deployment today of licence plate recognition cameras for law enforcement (or just tolls), and no-one has really noticed. In parallel, public and private surveillance cameras have become a basic investigative tool, with a little more concern. But what if every police patrol car had a bank of cameras that scan not just every number plate but every face within a hundred yards against a national database of outstanding warrants? What if the cameras in the subway do that? All the connected cameras in the city? China is already trying to do this, and we seem to be pretty sure we don’t like that, but why? One could argue that there’s no difference in principle, only in scale, but a change in scale can itself be a change in principle.
We had a lot of these kinds of puzzles with the rise of databases in the 1960s and 1970s. Things that had always been possible in theory at a small scale became practical at a massive scale, and people wrote books about the threats this posed. There was some panic in this, but some of the arguments were entirely correct and remain concerns today. Automation changes things, though our reactions to that can be hard to predict. In USA v Jones (2012), the court held that the police cannot put a GPS tracker on a suspect’s car without a warrant, thought they would not need a warrant to follow them around manually, the old-fashioned way. Is it that we don’t want it, or that we don’t want it to be too easy, or too automated? At the extreme, the US firearms agency is banned from storing gun records in a searchable database – everything has to be analogue, and searched by hand. There’s something about the automation itself that might change things. And, there was a time when people told the music industry that Napster was the same as mix tapes – that didn’t stop recorded music revenue falling by more than half from 2000 to 2014 (before streaming changed everything).
Generative AI is now creating a lot of new examples of scale itself as a difference in principle. You could look the emergent abuse of AI image generators, shrug, and talk about Photoshop: there have been fake nudes on the web for as long as there’s been a web. But when high-school boys can load photos of 50 or 500 classmates into an ML model and generate thousands of such images (let’s not even think about video) on a home PC (or their phone), that does seem like an important change. Faking people’s voices has been possible for a long time, but it’s new and different that any idiot can do it themselves. People have always cheated at homework and exams, but the internet made it easy and now ChatGPT makes it (almost) free. Again, something that has always been theoretically possible on a small scale becomes practically possible on a massive scale, and that changes what it means.
Part of the experience of databases, though, was that some things create discomfort only because they’re new and unfamiliar. Part of the ambivalence, for any given scenario, is the novelty, and that may settle and resettle. This might be a genuinely new and bad thing that we don’t like at all; or, it may be new and we decide we don’t care; we may decide that it’s just a new (worse?) expression of an old thing we don’t worry about; and, it may be that this was indeed being done before, even at scale, but somehow doing it like this makes it different, or just makes us more aware that it’s being done at all. Cambridge Analytica was a hoax, but it catalysed awareness of issues that were real.
Meanwhile, all the examples I’ve given so far involve systems working as designed, but of course half of the problems are actually when they break. Each new wave of automation creates new ways to do bad things at scale, or at least things we’re not sure about, but they also create new ways to screw up at scale, and people have been screwing up and ruining people’s lives with databases for generations – the UK’s Post Office Scandal is just the latest and most obvious example. Machine learning and now generative AI create new ways to screw up, most obviously around ‘AI bias’ (which I wrote about here five years ago). Just as with databases, some of the answer is to train technologists to try to avoid this, but you can’t regulate away bugs, and a lot of the solution, again, has to be making sure people know that the computer can be wrong.
These problems might be new, or new expressions of old problems, bit they may become entirely new kinds of problems though scale, but our reaction to that is a matter of perception, culture and politics, not technology, and while we can easily agree on the extremes there’s a very large grey area in the middle where reasonable people will disagree Fake nudes are bad, but should they be illegal? This will probably also be different in different places – a good illustration of this is in attitudes to compulsory national identity cards. The UK sees the very idea as a fundamental breach of civil liberties. France, the land of ‘liberté’, has them and doesn’t worry about it (but the French census does not collect ethnicity because the Nazis used this to round up Jews during the occupation), and the USA pretends not to but demands for ID are everywhere. There’s not necessarily any right answer here and no way to get to one through any analytic process – this is a social, cultural and political question, with all sorts of unpredictable outcomes. The US bans a gun database, and yet, the US also has a company that scans your driving licence against a private blacklist of tens of thousands of people (another database) shared across over thousands of bars and nightclubs. And no privacy laws.