Research

Is Attention Enough? A LoRA Target-Module Study on Qwen3-8B Math SFT

A controlled comparison of attention-only and all-layer LoRA on Qwen3-8B using an OpenMathInstruct-2-derived math SFT dataset and GSM8K evaluation.

May 17, 2026
--
Expert Emergence in a Small Sparse MoE Transformer Trained on Code, Math, and Prose

Training a small MoE model on code, math, and prose to observe whether expert routing patterns emerge from domain structure.

Feb 27, 2026
--