CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
Published in International Conference on Machine Learning (ICML), 2026
CONCUR improves agentic batch inference throughput with congestion-based concurrency control for KV-cache pressure.
Recommended citation: Qiaoling Chen, Zhisheng Ye, Tian Tang, Peng Sun, Boyu Tian, Guoteng Wang, Shenggui Li, Yonggang Wen, Zhenhua Han, and Tianwei Zhang. "CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control." International Conference on Machine Learning (ICML), 2026.
Download Paper
