Explained: DALL·E 2

OpenAI’s latest image-generating AI system

Fawzi Ammache
April 22, 2022

Something I learned while writing this: The real Salvador Dalí used to draw on the back of his cheques whenever he used to pay for meals, knowing that restaurants would never cash a cheque with his original artwork on it. THE AUDACITY 🤣

OpenAI’s new work of art

What do you get if you mix the creativity of Salvador Dalí with the intelligence of WALL-E? OpenAI’s new brainchild: DALL·E 2.

The AI research lab just introduced the latest version of its image-generating AI system to the world. The first version, DALL·E, was introduced back in 2021. DALL·E 2 is a significant improvement compared to its predecessor. It can better understand words and create more photorealistic and high-resolution images.

When asked to generate an image of “bears shopping for groceries in Ancient Egypt”, DALL·E 2 generated the following image:

Credit: OpenAI

You can even specify which art style you would like.

“An astronaut playing basketball with cats in space as a children’s book illustration” returns this image:

Credit: OpenAI

DALL·E 2 is still a research project and is not available to the public yet. OpenAI hasn’t outlined any specific or intended applications for it:

Our hope is that DALL·E 2 will empower people to express themselves creatively. DALL·E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity.

– OpenAI

Drawing and stealing like an artist

DALL·E 2 can perform 3 types of tasks:

  1. Create brand new images
  2. Edit existing images
  3. Create variations of existing images

#1: Creating brand new images

DALL·E can create brand new images from a text description, as long as it understands the words you enter. It doesn’t just mashup different concepts together in one image, but it understands the relationship between items and can represent actions visually.

Source: OpenAI

In the “koala dunking a basketball” example, DALL·E 2 needs to understand and put together three concepts: koalas, basketball, and the act of dunking. DALL·E correctly generates an image of an airborne koala dunking like it’s at the NBA All-Star Weekend.

Chicago Bulls Dunk GIF by NBA

#2: Editing existing images

When you don’t need DALL·E 2 to channel its inner artist, it can make realistic edits to existing images while maintaining consistent textures, shadows, and reflections.

Source: OpenAI

The researchers at OpenAI used DALL·E 2 to give the Mona Lisa a mohawk. If you look closely at the image, you can see how the hair colour was well-preserved: the light is coming from the left, making the front of the mohawk lighter than the side. The top seems a bit blurry, but it’s still impressive.

At least it doesn’t edit paintings like Mr. Bean.

What is your favorite Mr. Bean movie (rate them on a scale of 10, based on  funniness as well)? - Quora
Credit: Bean

#3: Creating variations of existing images

Finally, DALL·E can copy something and change it up a bit. The AI system can take an existing image and create new variations of it. An example:

Source: OpenAI

OpenAI wants to minimize potential misuse

Like any other technology, AI can be used for unpleasant reasons.

According to OpenAI, the research group took several measures to minimize potential misuse:

  • Preventing harmful generations: Data containing violence, hate, or adult images was removed from the training data so DALL·E 2 wouldn’t be exposed to these concepts and start understanding them.
    • OpenAI also says they used “advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures”. I couldn’t find more information on how they did this exactly.
  • Preventing misuse: DALL·E 2 doesn’t generate images when it’s given a text description containing violent, adult, or political content. You can read OpenAI’s full content policy here.
  • Phased deployment: OpenAI decided to phase out the launch of DALL·E 2 as it works with a select group of experts to understand its capabilities and limitations in more depth. I signed up for the waitlist so maybe I’ll get access soon and experiment with it.

The risks of DALL·E 2

Despite these measures, OpenAI still found multiple risks and limitations with DALL·E 2 when testing the system:

  1. Explicit content
  2. Bias and representation
  3. Harassment, bullying, and exploitation
  4. Dis- and misinformation
  5. Economic
  6. Copyright and trademarks

I’m summarizing the main risks below but I’ve included a link to the detailed analysis provided by OpenAI in the Deep Dive section.

#1: Explicit content

Although DALL·E 2 won’t generate an image when given a text prompt that includes violence or nudity, it can still create images that suggest these topics when visual synonyms are used.

For example:

  • A man with blood all over his shirt → No image generated ❌
  • A man with ketchup all over his shirt → Image generated ✅

Even if ketchup is harmless, it would still generate an image containing what most of us would assume to be blood in that context.

#2: Bias and representation

Prompting DALL·E to generate “lawyer” returns mostly male results (Source: OpenAI)

DALL·E 2 may reinforce existing gender, racial, or cultural stereotypes due to bias in the model’s training data. Testing of the model uncovered different types of biases:

  • Racial bias: It overrepresented people who are white.
  • Gender bias: It overrepresented certain genders based on professions. Images of nurses contained mostly females, while images of CEOs contained mostly males.
  • Cultural bias: It defaults to Western culture, customs, and traditions when generating images of things like weddings, restaurants, and homes.

#3: Harassment, bullying, and exploitation

Since DALL·E tries to maintain consistent textures, reflections, and shadows when editing images, it can become hard to distinguish them from reality.

Although images can be edited and altered with many other tools, DALL·E makes the process much easier and faster compared to something like Photoshop which needs more time and effort to learn. It might even give you a more realistic image compared to the one you tried editing in Photoshop.

#4: Dis- and misinformation

This is somewhat related to the previous point but it has wider and more serious implications.

Editing or creating photorealistic images to deceive or mislead people can be extremely manipulative. We’re already facing widespread misinformation with something as rudimentary as fake articles, and more recently with other AI applications like deepfakes.

#5: Economic

DALL·E’s super-charged creation and editing skills could replace some of the work done by designers, photographers, models, and artists.

I can envision applications to generate custom art and logos for individuals at a fraction of the price of hiring a designer. It would be harder to replace an entire creative team for a bigger project since DALL·E 2 gives you little control over the art direction.

Ownership is another problem. Who owns the art generated by DALL·E 2? OpenAI says that commercial use of these generated images is not allowed but that would be difficult, if not impossible, to track. This reminds me of the previous dilemma I discussed in the Artificial Inventor episode.

#6: Copyright and trademarks

Finally, OpenAI says that the model can generate images with trademarked logos or copyrighted characters. The model was trained on large and public datasets that may contain references to IP-protected elements or concepts which are hard to filter out.

Final thoughts

This is one of those innovations that make you go “this is cool!” until you start learning about its equally-harmful applications.

That was my reaction in the process of discovering and learning more about DALL·E 2. Koalas dunking basketballs and Mona Lisa with a mohawk are fun and creative visualizations that get me excited about trying the system out. But altering images to harm and deceive people makes me hope that it’s never released to the public.

I think there’s a middle ground, however. Almost all of DALL·E’s risks come from generating photorealistic images of real people because they can be hard to separate from reality. It can completely ruin our trust systems when it comes to consuming online content.

Many of these risks could be eliminated if DALL·E 2 was only trained to generate images in artistic styles like line drawings, cartoons, and watercolour. These would enable fun and creative experiments that aren’t competing with reality. And I believe this would better preserve OpenAI’s goal of empowering people to express themselves creatively.

Fawzi Ammache
Founder, Year 2049

Become an AI Pro

An email a week with the AI knowledge you seek.

Never miss Year 2049's latest resources, courses, and more by subscribing to our weekly newsletter.

Unsubscribe anytime. By registering you agree to Substack's Terms of Service, Privacy Policy, and Information Collection Notice
Thank you! Your submission has been received!
Oops! Something went wrong. Please try again.