Introducing Visual ChatGPT

Researchers at Microsoft are working on a new application based on ChatGPT and solutions like Stable Diffusion. Visual ChatGPT is designed to allow users to generate images using text input and then edit individual elements. In their paper “Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models” Chenfei Wu and his co-authors write: “We build a system called Visual ChatGPT, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps” – and, not to forget: “3) providing feedback and asking for corrected results” (Wu et al. 2023). For example, one lets an appropriate prompt create an image of a landscape, with blue sky, hills, meadows, flowers, and trees. Then, one instructs Visual ChatGPT with another prompt to make the hills higher and the sky more dusky and cloudy. One can also ask the program what color the flowers are and color them with another prompt. A final prompt makes the trees in the foreground appear greener. The paper can be downloaded from arxiv.org.