Twilight: Adaptive Attention Sparsity with Hierarchical Top-p Pruning
Published in Conference on Neural Information Processing Systems (NeurIPS), 2025
Twilight brings adaptive budget decisions to sparse attention for efficient long-context LLM inference.
Recommended citation: Chaofan Lin, Jiaming Tang, Shuo Yang, Hanshuo Wang, Tian Tang, Boyu Tian, Ion Stoica, Song Han, and Mingyu Gao. "Twilight: Adaptive Attention Sparsity with Hierarchical Top-p Pruning." Conference on Neural Information Processing Systems (NeurIPS), 2025.
Download Paper
