Prompt-to-Prompt Image Editing with Cross-Attention Control

If you like our work, please consider supporting us so we can keep doing what we do. And as a current subscriber, enjoy this nice discount!

Get 45% off forever

Also: if you haven’t yet, follow us on Twitter, TikTok, or YouTube!

Text-based image synthesis models are appealing to humans because they can verbally describe their intent. However, these models are challenging to edit because a small modification of the text prompt often leads to a completely different outcome. Editing is challenging for these models because an innate property of an editing technique is to preserve most of the original image, but in the text-based models, even a small modification of the text often leads to a completely different outcome. One way to preserve that is by providing a spatial mask to localize the edit, but that ignores the original structure and content within the masked region

The author presents a method for editing images that do not require a mask and demonstrate how this method can be used to edit images by replacing or adding words to the text prompt.