AI PDF parsing remains one of the technology's 'unsexy failures'

Despite rapid progress in coding and physics, AI still can't reliably extract information from PDFs. State-of-the-art models asked to extract PDF content will summarize instead, confuse footnotes with body text, or hallucinate contents, according to a data company CEO. The problem surfaced acutely when developers tried building search tools for the 3M+ Epstein-related DOJ documents. Researcher Pierre-Carl Langlais places "PDF parsing is solved!" on his AI timeline shortly before AGI, highlighting the gap between headline-grabbing capabilities and mundane real-world utility.

View full digest for February 23, 2026