OpenAI, the San Francisco-based research company behind the breakthrough AI language generator GPT-3, has developed a new system that can create images from short text captions.
In a blog post, OpenAI said that DALL-E, a portmanteau of artist Salvador Dalí and Pixar’s robot hero WALL-E, had demonstrated the ability to create images “for a wide range of concepts”. Among its illustrations is a picture of an armchair shaped like an avocado.
Images generated by neural networks, a type of machine learning that can spot patterns, are not new. Generative adversarial networks, which rely on a pair of neural networks — one creating content and the other assessing how close it is to a desired output — have been used to create images of realistic humans, cats, rental properties and snacks.
But DALL-E is notable for being able to produce images based on text inputs. The system is based on a version of GPT-3, the text-generation system that has been used to write poetry, news articles and text adventures.
Drawing on a training data set of paired text and images, the DALL-E system can produce new images based on prompts. It has created images including a baby radish in a tutu walking a dog and a cube made of porcupine. It has also showed the ability to create images in a variety of artistic styles.
OpenAI also revealed Clip, an image recognition system, which is designed for more general use than current systems, which are largely specialised for a single task. It was trained on text-image pairs publicly available online.
The work still needs some refining. OpenAI noted that DALL-E is currently unable reliably to count past three, and sometimes suffers confusion around nouns with multiple meanings such as glasses. The researchers also found that different phrasings in captions could yield different outcomes.
Deeper issues also remain to be resolved. “We recognise that work involving generative models has the potential for significant, broad societal impacts,” OpenAI said, adding that potential future steps include studying the economic impact on professions, bias in outputs and the longer-term ethical challenges of the technology.
There have long been concerns around the abuse of AI-generated media, with neural networks being used to create fake videos, audio and images for unethical purposes ranging from political disinformation to attempted fraud.