Equipped with a robust model trained across a broad spectrum of tasks and guided by learned task embeddings, we explore few-shot adaptation to unseen tasks via task inversion. In this process, we keep the model weights frozen, and solely update a task embedding to fit the new task. Our experiments demonstrate that Emu Edit can swiftly adapt to new tasks, such as super-resolution, contour detection, and others. This makes task inversion with Emu Edit particularly advantageous in scenarios where labeled examples are limited, or when the compute budget is low.
To support rigorous and informed evaluation of instruction-based image editing models we collect and publicly release a new benchmark that includes seven different image editing tasks: background alteration (background), comprehensive image changes (global), style alteration (style), object removal (remove), object addition (add), localized modifications (local), and color/texture alterations (texture) Additionally, to allow proper comparison against Emu Edit, we release Emu Edit’s generations on the dataset.
We extend our gratitude to the following people for their contributions (alphabetical order): Andrew Brown, Ankit Ramchandani, Guan Pang, Ishan Misra, Mannat Singh, Ning Zhang, Parveen Krishnan, Peizhao Zhang, Peter Vajda, Rohit Girdhar, Roshan Sumbaly, Tong Xiao, Vladan Petrovic, Xide Xia.
@inproceedings{Sheynin2023EmuEP,
title={Emu Edit: Precise Image Editing via Recognition and Generation Tasks},
author={Shelly Sheynin and Adam Polyak and Uriel Singer and Yuval Kirstain and Amit Zohar and Oron Ashual and Devi Parikh and Yaniv Taigman},
year={2023},
url={https://api.semanticscholar.org/CorpusID:265221391}
}