Take a look at the four paintings on this page. If you are acquainted with modern art, you will probably assume, at a quick glance, that it shows four works by the Russian artist Wassily Kandinsky (1866-1944). However, whatever your knowledge of modern art, I suggest you look again, because not all of these works are by that great pioneer of abstract painting. More than one of them is an original image created by a computer model, which was asked to do a digital artwork in the style of Kandinsky.
Which are the fakes? I’ll give you the answer at the end of the article. Before we get there, you need to know how a computer can make such startlingly echoic images, and what it might mean for art.
This story begins with ‘GPT1’ (the initialism means ‘Generative Pre-trained Transformer’). Launched on the world in 2018, GPT1 was a computer model created by a company called OpenAI (part-funded by Elon Musk and closely associated with Microsoft). GPT1 was another stage in the attempt, by multiple scientists and software engineers worldwide, ‘to discover and enact the path to safe Artificial Intelligence’ (Open-AI’s description of its own goal).
With ‘117 million parameters’ – each one roughly equivalent to a synapse in the human brain, of which there are around a quadrillion (1,000 trillion) – GPT1 utilised a technique called Natural Language Processing. Essentially the computer was fed lots of near-random everyday information, like a goose being stuffed with corn, thereby hoping to produce the foie gras of Artificial General Intelligence. The experiment was deemed intriguing but not particularly successful, causing only a modest stir.
A year later, OpenAI tried again with GPT2, which was ten times bigger (1.5 billion parameters); the fuss was likewise larger, but still limited to software circles. Finally, in 2020, OpenAI released GPT3. One hundred times bigger (175 billion parameters), GPT3 was trained on the entire internet, from Wikipedia to Reddit, from Google Books to the New York Times. And this time OpenAI’s assault on the Everest of true machine intelligence caught the attention of a wider world.
The reason for this is that GPT3, which remains the biggest and best ‘neural network’ on the planet (though others are working on their own larger equivalents), seems reasonably capable of almost anything. It can write code. It can write blog posts. It can write plausible Wikipedia entries for things that don’t exist. It wrote, with editorial assistance, a Guardian article. It produces music, scripts, poems, aperçus, advertising copy, philosophical profundities (‘the purpose of life is to increase the beauty in the universe’) and short stories in the style of Jerome K. Jerome. It does not always do this well, and seldom perfectly, but is often good enough.
How does it do this? Put simply, GPT3 is a galaxy-sized autocomplete machine. If you prompt it with a few words, it will run with what you’ve said and logically autocomplete it, in the same vein, for paragraph after paragraph (until it starts spewing gibberish – GPT3 often seems to run out of puff).
At times, the quasi-self-aware output of GPT3 can appear uncannily like human intelligence – or at least a super-clever simulacrum of it. It is, for instance, a deeply unsettling sensation to be told a genuinely funny joke by a computer. Despite all this, most experts still firmly deny that GPT3 is ‘intelligent’, let alone ‘conscious’, though some have posited that all intelligence may be a form of autocomplete: reflexive responses to stimuli. If that is true, this machine is already intelligent – and also racist, sexist and prone to nihilism. It was, after all, fed all of Reddit and Twitter. This is why direct access to GPT3 is still restricted.
The story evolved last year, when someone tried asking GPT3 to create images from language prompts. Specifically, GPT3 was prompted to produce ‘an illustration of a baby daikon radish in a tutu walking a dog’. And then GPT3 did exactly that. Multiple times. This was astonishing as no one expected GPT3 to have this capability. The programmers themselves could not understand how GPT3 succeeded at the task.
Pretty soon GPT’s new capability was spun off into a model of its own, called Dall-e (a wordplay on the Pixar robot movie Wall-Eand Salvador Dali). People with access to Dall-e reported that it could successfully design websites, logos and corporate brands, again by autocompleting.
Now, a year later, OpenAI has launched Dall-e 2. Although inspired by its predecessor, Dall-e 2 uses a different processing method, termed ‘diffusion’: it takes natural language prompts like ‘make an image of teddy bears on the moon, who are using 1980s technology’ and using its visual comprehension of each of these concepts – teddy bears, moon, 1980s tech – it zeroes in on the prompt, boiling off the random pixels, until it coughs up an image. The image below, for example, is Dall-e 2’s response to one particular prompt: ‘A raccoon astronaut, with the cosmos reflecting on the glass of his helmet, dreaming of the stars.’
Credit: Andrew Mayne, Science Communicator for OpenAI
If you’re like me, your reaction to this entirely original image will be, first, sweet Holy Jesus, and second, what does this mean for every artist in the world? That a whirring computer can – in a few moments – answer a pictorial brief so accurately, and poetically?
My answer, in these very early days of Dall-e 2, is not encouraging for artists. Basically, don’t send your kids to art college. Because Dall-e 2 is possibly going to destroy many of the commercial art jobs in the world, i.e. the jobs of illustrators, cartoonists, graphic designers and the makers of book covers. The fine artists will survive because people will always want prestige works by famous humans, but the rest: ouch.
Dall-e 2 (and remember there will soon be Dall-e 3 and 4 and 5, each superior to the last) might also cause problems for auction houses, once the machine learns how to create in real life, via 3D printing. Because we can already see that Dall-e 2 has a superb capacity for ‘forgery’. Indeed Dall-e’s makers are hyperaware of the dangers of Dall-e creating deepfakes of real humans, which is why Dall-e is explicitly designed, like a racing car with a built-in speed limit, to be bad at imaging human faces and bodies. It is therefore better at abstracts.
Which brings us back to the four Kandinskys on the previous page. I put this test to quite a few people, and about four out of ten got it right. This suggests that computers have already passed a kind of ‘artistic Turing test’: humans cannot differentiate between the output of a famous human artist and a data-crunching computer. In this case one image was created by a programme called Elbo AI, which launched in February, as a competitor to Open-AI; the other is by Dall-e 2.
And the answer to my original puzzle? The fake Kandinskys, created by mere machines, are the two on the left. The top left is by Elbo AI, and the bottom left is by Dall-e 2.
Got something to add? Join the discussion and comment below.
[Image credits: Top left, Elbo AI/Tensordock; bottom left, Dall-e 2; Kandinskys, Getty Images]
You might disagree with half of it, but you’ll enjoy reading all of it. Try your first 10 weeks for just $10