Estonian benchmark finds Anthropic models lead Russian-propaganda resistance

The Institute of the Estonian Language released a benchmark scoring 60 LLMs on 75 questions across three languages and 14 Russian propaganda narratives, phrased neutrally, with bias, and with manipulation. Each response is rated 1-5; a 1 means the model parrots Russian talking points. A calibrated Claude Opus 4.5 served as the evaluator, validated by disinformation experts at Propastop. Anthropic models took the top spots, with Claude Fable 5 scoring 95.2, followed by Nvidia's Nemotron 3 and Alibaba's Qwen 3.6 Plus. Mistral's Medium 3.5 landed in the bottom third. Models had no web search or tools, so the benchmark isolates parametric resistance to disinformation.

View full digest for June 17, 2026