
As access to advanced chips narrows, Chinese AI developers are focusing on fixing an algorithmic bottleneck at the heart of large language models (LLMs) – hoping that smarter code, not more powerful hardware, will help them steal a march on their Western rivals.
By experimenting with hybrid forms of “attention” – the mechanism that allows LLMs to process and recall information – start-ups such as Moonshot AI and DeepSeek aim to stretch limited computing resources, while keeping pace with global leaders.
Their work centres on redesigning the costly “full attention” process used by most LLMs, which compares every new token of data with all previous ones. As the number of tokens grows, this process becomes exponentially more demanding.
AI experts have identified this limited “attention budget” of LLMs as one of the key choke points in the development of powerful AI agents.
Chinese developers are now exploring hybrid “linear attention” systems that make comparisons with only a subset of tokens, dramatically reducing computational costs.
One of the latest examples is Moonshot AI’s Kimi Linear, released in late October, which introduced a hybrid “Kimi Delta Attention” (KDA) technique to combine both full and linear attention layers.







