
Your 70-Billion-Parameter Model Might Be 40% Wasted
Your 70-Billion-Parameter Model Might Be 40% Wasted Three papers from February 1–6, 2026 converge on a question the field has been avoiding since 2016: what if most transformer layers aren't doing compositional reasoning

















