Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

Xingpei Ma^* Shenneng Huang^* Jiaran Cai^*† Yuansheng Guan^* Shen Zheng^* Hanfeng Zhao Qiang Zhang Shunsi Zhang

^* Equal contribution ^†Project lead & Corresponding Author

Guangzhou Quwan Network Technology

AAAI 2026

TL; DR: We present Playmate2, which effectively tackles key challenges related to temporal coherence in long sequences and multi-character animations, for generating high-quality audio-driven videos. To the best of our knowledge, this is the first training-free approach capable of enabling audio-driven animation for three or more characters without requiring additional data or model modifications.

📰 News

2025/11/21: 🔥🔥🔥 We release the weights and inference code of Playmate2!
2025/11/10: 🎉🎉🎉 Our paper has been accepted and will be presented at AAAI 2026. We plan to release the inference code and model weights for both Playmate and Playmate2 in the coming weeks. Stay tuned and thank you for your patience!
2025/10/15: 🚀🚀🚀 Our paper is in public on arxiv.

📸 Showcase

Multi-Character Animation

multi_persons_09-multiperson_30.mp4

test_1.mp4

multi_persons_11-multiperson_09.mp4

Singing Videos

cover.mp4

female_song_01-female_55.mp4

sing_1_1.mp4

sing_3_1.mp4

sing_4_1.mp4

Multi-Style Animation

11.mp4	22.mp4	33.mp4
44_1.mp4	55.mp4	66_1.mp4
77.mp4	88.mp4	99.mp4

Explore more examples.

Quick Start

🛠️Installation

1. Create a conda environment and install pytorch, xformers

conda create -n playmate2 python=3.10
conda activate playmate2
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -U xformers==0.0.29 --index-url https://download.pytorch.org/whl/cu124

2. Flash-attn installation:

pip install misaki[en]
pip install ninja 
pip install psutil 
pip install packaging 
pip install flash_attn==2.7.4.post1 --no-build-isolation

3. Other dependencies

pip install -r requirements.txt

4. FFmeg installation

conda install -c conda-forge ffmpeg

or

sudo yum install ffmpeg ffmpeg-devel

🧱Model Preparation

Model Download

Models	Download Link	Save Path
Wan2.1-I2V-14B-720P	Huggingface	pretrained_weights/Wan2.1-I2V-14B-720P
chinese-wav2vec2-base	Huggingface	pretrained_weights/chinese-wav2vec2-base
VideoLLaMA3-7B	Huggingface	pretrained_weights/VideoLLaMA3-7B
Our Pretrained Model	Huggingface	pretrained_weights/playmate2

Download models using huggingface-cli:

mkdir pretrained_weights
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./pretrained_weights/Wan2.1-I2V-14B-720P
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./pretrained_weights/chinese-wav2vec2-base
huggingface-cli download TencentGameMate/chinese-wav2vec2-base model.safetensors --revision refs/pr/1 --local-dir ./pretrained_weights/chinese-wav2vec2-base
huggingface-cli download DAMO-NLP-SG/VideoLLaMA3-7B --local-dir ./pretrained_weights/VideoLLaMA3-7B
huggingface-cli download PlaymateAI/Playmate2 --local-dir ./pretrained_weights/playmate2

Inference

It is recommended to use an A100 or higher GPUs for inference.

One person

python inference.py \
    --gpu_num 1 \  # 1(single gpu) or 3(multiple gpus)
    --image_path examples/images/01.png \
    --audio_path examples/audios/01.wav \
    --prompt_path examples/prompts/01.txt \
    --output_path examples/outputs/01.mp4 \
    --max_size 1280 \
    --id_num 1

Multiple Persons

# N represent the number of persons
python inference.py \
    --gpu_num 1 \  # 1(single gpu) or 3+N-1(multiple gpus)
    --image_path examples/images/04.png \
    --audio_path examples/audios/04 \
    --mask_path examples/masks/04 \
    --prompt_path examples/prompts/04.txt \
    --output_path examples/outputs/04.mp4 \
    --max_size 1280 \
    --id_num 3

📑 Todo List

📝 Citation

If you find our work useful for your research, please consider citing the paper:


@article{ma2025playmate2,
  title={Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback},
  author={Ma, Xingpei and Huang, Shenneng and Cai, Jiaran and Guan, Yuansheng and Zheng, Shen and Zhao, Hanfeng and Zhang, Qiang and Zhang, Shunsi},
  journal={arXiv preprint arXiv:2510.12089},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
examples		examples
src		src
wan_configs		wan_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extract_image_prompt.py		extract_image_prompt.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

📰 News

📸 Showcase

Multi-Character Animation

Singing Videos

Multi-Style Animation

Quick Start

🛠️Installation

1. Create a conda environment and install pytorch, xformers

2. Flash-attn installation:

3. Other dependencies

4. FFmeg installation

🧱Model Preparation

Model Download

Inference

📑 Todo List

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

Playmate111/Playmate2

Folders and files

Latest commit

History

Repository files navigation

Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

📰 News

📸 Showcase

Multi-Character Animation

Singing Videos

Multi-Style Animation

Quick Start

🛠️Installation

1. Create a conda environment and install pytorch, xformers

2. Flash-attn installation:

3. Other dependencies

4. FFmeg installation

🧱Model Preparation

Model Download

Inference

📑 Todo List

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages