Skip to content

AILab-CVC/SEED

 
 

Repository files navigation

🌰 SEED Multimodal

Project Homepage arXiv arXiv

Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.

image

The repository provides the official implementation of SEED, SEED-LLaMA. For any inquiries, please email [email protected].

News

🍻 We are actively looking for self-motivated interns. Please feel free to reach out if you are interested. 🍻

  • 👀 Release the checkpoints and code of the SEED-2 tokenizer, and SEED-LLaMA-8B/14B. Expected to be in late October.
  • 👀 We will soon release an online demo for SEED-LLaMA.
  • 2023-10-02 📎 We release the technical report of SEED-LLaMA on arXiv, which is empowered by the improved SEED-2 tokenizer.
  • 2023-07-29 :octocat: We release the checkpoint of the SEED tokenizer and its inference code. [Getting started]
  • 2023-07-16 📎 We release the technical report of SEED on arXiv.

Stay tuned for the updates!

SEED Tokenizer v1

[arXiv]

image

SEED Tokenizer v1 for Image Reconstruction

image

SEED-OPT2.7B for Multimodal Comprehension

image

SEED-OPT2.7B for Multimodal Generation

image

Dependencies and Installation

Installation

  1. Clone repo

    git clone https://github.com/AILab-CVC/SEED.git
    cd SEED
  2. Install dependent packages

    sh install.sh

Model Weights

We release the pre-trained SEED Visual Tokenizer in google drive.

Inference

To discretize an image to 1D vision codes with causal dependency, and reconstruct the image from the vision codes using stable diffusion UNet,

  1. Download the pre-trained SEED Visual Tokenizer and stable diffusion model in google drive and put them under the folder "pretrained".
  2. run the inference code.
    python demo_recon.py

Citation

If you find the work helpful, please consider citing:

@article{ge2023making,
  title={Making LLaMA SEE and Draw with SEED Tokenizer},
  author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2310.01218},
  year={2023}
}

@article{ge2023planting,
  title={Planting a seed of vision in large language model},
  author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
  journal={arXiv preprint arXiv:2307.08041},
  year={2023}
}

The project is still in progress. Stay tuned for more updates!

License

SEED is released under Apache License Version 2.0.

Acknowledgement

We utilize Stable Diffusion to decode images from our visual codes, and use its implementation and pre-trained model in https://github.com/CompVis/stable-diffusion. Our code is developped based on https://github.com/salesforce/LAVIS. Thanks for their wonderful works.

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages