Switch Transformers
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Precursor of Switch Transformers
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Mixtral SOTA
Adaptive mixtures of local experts
Adaptive mixtures of local experts
Hierarchical Mixtures Of Experts And The Em Algorithm
Hierarchical Mixtures Of Experts And The Em Algorithm
Blackboard design pattern
Two complementary patterns to build multi-expert systems