Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Published in International Conference on Learning Representations (ICLR), 2025
Fiddler is a resource-efficient inference system for Mixture-of-Experts models with limited GPU resources.
Recommended citation: Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, and Baris Kasikci. "Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models." International Conference on Learning Representations (ICLR), 2025.
*Equal contribution.
Download Paper
