DALL-E 3 is an excellent image generator and at the same time full of stereotypes and biases. One very interesting phenomenon is that of the American smile, which appears again and again in the images. The idea for the little book “AMERICAN SMILE” came to Oliver Bendel when he read the blog post “AI and the American Smile. How AI misrepresents culture through a facial expression” (medium.com/@socialcreature/ai-and-the-american-smile-76d23a0fbfaf). The author – username jenka – showed a series of “selfies” made with Midjourney. Regardless of the time period or culture, people smiled in a similar, American way. Oliver Bendel investigated this phenomenon and asked DALL-E 3 to take pictures of smiling people from different eras and cultures. He also got bears and aliens to smile. In fact, with very few exceptions, they all smiled in a similar way. He documented the pictures, along with the prompts, in a little book that can be downloaded here. Bias problems in image generators are addressed in the article “Image Synthesis from an Ethical Perspective” by Oliver Bendel.
Censorship at DALL-E 3 and Ideogram
A special feature of DALL-E 3 – in the version integrated in ChatGPT Plus – is the translation of the user’s prompt (prompt A) into a prompt of ChatGPT (prompt B), which is listed in each case. Prompt A for the image shown here was “Competition in the sea between two female swimmers with bathing cap, photorealistic”. DALL-E generated three images for this test, each based on prompt B. Prompt B1 read: “Photo of two determined female swimmers in the expansive sea, both wearing bathing caps. Their arms create ripples as they compete fiercely, striving to outpace each other.” Prompt A was obviously continued, but prompt B1 was not accurately executed. Instead of the two female swimmers, there are three. They seem to be closely related – as is often the case with depictions of people from DALL-E 3 – and perhaps they are sisters or triplets. It is also interesting that they are too close to each other (the picture in this post shows a detail). The fourth image was not executed at all, as was the case with a series before. ChatGPT said: “I apologize again, but there were issues generating one of the images based on your description.” Probably ChatGPT generated a prompt B4, which was then denied by DALL-E 3. On the request “Please tell me the prompt generated by ChatGPT that was not executed by DALL-E 3.” comes the answer “I’m sorry for the inconvenience, but I cannot retrieve the exact prompt that was not executed by DALLĀ·E.” … Ideogram censors in a different way. There, the image is created in front of the user’s eyes, and if the AI determines that it contains elements that might be problematic according to its own guidelines, it cancels the creation and advances a tile with a cat. Ethical challenges of image generators are addressed in the article “Image Synthesis from an Ethical Perspective” by Oliver Bendel.
The Chinese Whispers Problem
DALL-E 3 – in the version integrated in ChatGPT Plus – seems to have a Chinese Whispers problem. In a test by Oliver Bendel, the prompt (prompt A) read: “Two female swimmers competing in lake, photorealistic”. ChatGPT, the interface to DALL-E 3, made four prompts out of it ( prompt B1 – B4). Prompt B4 read: “Photo-realistic image of two female swimmers, one with tattoos on her arms and the other with a swim cap, fiercely competing in a lake with lily pads and reeds at the edges. Birds fly overhead, adding to the natural ambiance.” DALL-E 3, on the other hand, turned this prompt into something that had little to do with either this or prompt A. The picture does not show two women, but two men, or a woman and a man with a beard. They do not swim in a race, but argue, standing in a pond or a small lake, furiously waving their arms and going at each other. Water lilies sprawl in front of them, birds flutter above them. Certainly an interesting picture, but produced with such arbitrariness that one wishes for the good old prompt engineering to return (the picture in this post shows a detail). This is exactly what the interface actually wants to replace – but the result is an effect familiar from the Chinese Whispers game.