Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Published in International Conference on Learning Representations (ICLR), 2025

Fiddler is a resource-efficient inference system for Mixture-of-Experts models with limited GPU resources.

Recommended citation: Keisuke Kamahori^*, Tian Tang^*, Yile Gu, Kan Zhu, and Baris Kasikci. "Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models." International Conference on Learning Representations (ICLR), 2025.
^*Equal contribution.
Download Paper

Share on

Twitter Facebook LinkedIn