[Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity]9https://arxiv.org/abs/2101.03961)
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Mixtral SOTA
Mixtral of Experts blog post
Copyright claim: Mixture of Experts model papers is created by melonskin on 2025/07/12. Its copyright belongs to the author. Commercial usage must be authorized by the author. The source should be included for non-commercial purposes.
Link to the article: https://amelon.org/2025/07/12/moe-papers.html