Research
-
Expert Emergence in a Small Sparse MoE Transformer Trained on Code, Math, and Prose
Training a small MoE model on code, math, and prose to observe whether expert routing patterns emerge from domain structure.
Training a small MoE model on code, math, and prose to observe whether expert routing patterns emerge from domain structure.