With 600 billion parameters, V6 is China’s leading model in multimodal reasoning and also the most cost-effective option for inference across the industry, according to the company.
Xu also said that V6 Reasoner outperformed OpenAI’s o1 and Google’s Gemini 2.0 Flash Thinking in multimodal reasoning abilities. The advances are designed to address an industry-wide challenge: the depletion of high-quality text data for training large language models (LLMs).

Unlike traditional LLMs that focus primarily on text, multimodal LLMs integrate various modalities – such as images, audio and video – to improve comprehension and generation capabilities.
The industry’s initial strategy of expanding model parameters under the scaling law had “hit a wall”, Xu said in an interview in Shanghai on Thursday. “We’ve nearly exhausted all text data that can be collected from the internet,” he said.