Will DeepSeek’s new AI model break the ‘long-context’ bottleneck holding back LLMs?

DeepSeek’s new artificial intelligence model that converts images into text is not just a document parsing tool but a potential preview of its next generation of large language models (LLMs), according to AI experts.

Released on Monday, DeepSeek-OCR is technically an optical character recognition (OCR) model – an AI system that uses computer vision to convert images into machine-readable text. Common applications include smart vehicles and document scanners.

The Hangzhou-based start-up cited the model’s industry-leading scores on OmniDocBench, a popular benchmark for evaluating AI models’ document parsing capabilities.

But this OCR label “can almost be ignored”, said Florian Brand, a PhD student at Germany’s Trier University and an expert on open-source models. Rather, he believed the accompanying research paper for the model hinted at its real purpose, which was to improve the efficiency of DeepSeek’s flagship series of LLMs.

“The paper is mostly about compression,” said Brand. LLMs have been the main driver of the generative AI boom in recent years, from OpenAI’s ChatGPT to DeepSeek’s R1. LLMs process inputs by turning text into tokens, which represent parts of a word.

The DeepSeek app’s welcome page is seen on a smartphone screen in Beijing, January 28, 2025. Photo: AP

However, LLMs struggle with “long context”, or lengthy data inputs, because the mechanism they use to pay “attention” – or look at – each token becomes more computationally costly as the number of tokens grows.

Source link