ByteDance study reveals why reasoning models overthink and how to fix it

A ByteDance study found that large reasoning models frequently think well past the correct answer, wasting tokens on cross-checking and reformulating solved problems. In 72% of cases where both correct and incorrect answers existed, the longer answer was more often wrong. The researchers developed SAGE, which identifies optimal reasoning paths hidden by standard inference. Models trained with SAGE-RL scored 2.1% higher while using 44.1% fewer tokens. The key finding: models actually know when they're done, but common sampling methods don't let them stop.

View full digest for February 26, 2026