New DISBench benchmark: best AI model finds only 29% of contextual photos correctly

Researchers from Renmin University of China and Oppo Research Institute created DISBench, a benchmark testing whether AI can retrieve specific photos from personal collections based on contextual clues across multiple images. Even Claude Opus 4.5, the top performer, correctly identified all relevant images only 29% of the time. Up to 50% of all errors stem from poor planning: models correctly identify the right context but stop searching too early or lose track of constraints. The finding highlights a gap between AI's improving visual recognition and its inability to handle multi-step retrieval tasks that humans do intuitively.

View full digest for February 23, 2026