-
Tsinghua University
- Beijing
Popular repositories Loading
-
gated_attention
gated_attention PublicThe official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
-
-
Tuning-keys-v.s.-values
Tuning-keys-v.s.-values PublicOfficial PyTorch Implementation of Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers [Tiny Paper @ ICLR 2024]
Python 4
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
