Research
-
Is Attention Enough? A LoRA Target-Module Study on Qwen3-8B Math SFT
A controlled comparison of attention-only and all-layer LoRA on Qwen3-8B using an OpenMathInstruct-2-derived math SFT dataset and GSM8K evaluation.
-
Expert Emergence in a Small Sparse MoE Transformer Trained on Code, Math, and Prose
Training a small MoE model on code, math, and prose to observe whether expert routing patterns emerge from domain structure.