Twilight: Adaptive Attention Sparsity with Hierarchical Top-p Pruning

Published in Conference on Neural Information Processing Systems (NeurIPS), 2025

Twilight brings adaptive budget decisions to sparse attention for efficient long-context LLM inference.

Recommended citation: Chaofan Lin, Jiaming Tang, Shuo Yang, Hanshuo Wang, Tian Tang, Boyu Tian, Ion Stoica, Song Han, and Mingyu Gao. "Twilight: Adaptive Attention Sparsity with Hierarchical Top-p Pruning." Conference on Neural Information Processing Systems (NeurIPS), 2025.
Download Paper