Skip to content

Release v1.0.0

Latest
Compare
Choose a tag to compare
@LinB203 LinB203 released this 04 Jun 06:50
· 56 commits to main since this release
cc7c79f

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation.
🌟Now data, model, training & evaluation script are open-source!

Key features:

  1. We observe that GPT-4o likely employs a non-mandatory VAE injection, making it difficult to preserve low-level features consistent with the reference image.
  2. We demonstrate remarkable image perception capabilities, surpassing those of GPT-4o.
  3. We used only 2.7M data samples—just 0.1% of BAGEL—achieving high efficiency. All data, training and evaluation code, and models have been fully open-sourced.

Future works:

  1. Continue collecting data and perform joint training with a VLM.
  2. Integrate higher-resolution semantic encoders or adopt VLM techniques to increase input-image resolution, such as multi-scale image gridding.