Powered by CV Center, Tencent AI Lab, and ARC Lab, Tencent PCG.
The repository provides the official implementation of SEED, SEED-LLaMA. For any inquiries, please email [email protected].
🍻 We are actively looking for self-motivated interns. Please feel free to reach out if you are interested. 🍻
- 👀 Release the checkpoints and code of the SEED-2 tokenizer, and SEED-LLaMA-8B/14B. Expected to be in late October.
- 👀 We will soon release an online demo for SEED-LLaMA.
- 2023-10-02 📎 We release the technical report of SEED-LLaMA on arXiv, which is empowered by the improved SEED-2 tokenizer.
- 2023-07-29
We release the checkpoint of the SEED tokenizer and its inference code. [Getting started]
- 2023-07-16 📎 We release the technical report of SEED on arXiv.
Stay tuned for the updates!
- Python >= 3.8 (Recommend to use Anaconda)
- PyTorch >= 1.11.0
- NVIDIA GPU + CUDA
-
Clone repo
git clone https://github.com/AILab-CVC/SEED.git cd SEED
-
Install dependent packages
sh install.sh
We release the pre-trained SEED Visual Tokenizer in google drive.
To discretize an image to 1D vision codes with causal dependency, and reconstruct the image from the vision codes using stable diffusion UNet,
- Download the pre-trained SEED Visual Tokenizer and stable diffusion model in google drive and put them under the folder "pretrained".
- run the inference code.
python demo_recon.py
If you find the work helpful, please consider citing:
@article{ge2023making,
title={Making LLaMA SEE and Draw with SEED Tokenizer},
author={Ge, Yuying and Zhao, Sijie and Zeng, Ziyun and Ge, Yixiao and Li, Chen and Wang, Xintao and Shan, Ying},
journal={arXiv preprint arXiv:2310.01218},
year={2023}
}
@article{ge2023planting,
title={Planting a seed of vision in large language model},
author={Ge, Yuying and Ge, Yixiao and Zeng, Ziyun and Wang, Xintao and Shan, Ying},
journal={arXiv preprint arXiv:2307.08041},
year={2023}
}
The project is still in progress. Stay tuned for more updates!
SEED
is released under Apache License Version 2.0.
We utilize Stable Diffusion
to decode images from our visual codes, and use its implementation and pre-trained model in https://github.com/CompVis/stable-diffusion. Our code is developped based on https://github.com/salesforce/LAVIS. Thanks for their wonderful works.