Chinese AI firm SenseTime bets on multimodal models to stand out from rivals

SenseTime, an artificial intelligence (AI) pioneer in China, has launched new models that it claims surpass OpenAI products in reasoning capabilities, as it bets on multimodal models to secure its position in the competitive AI landscape.

The company on Thursday unveiled SenseNova V6 and V6 Reasoner, new iterations of its self-developed AI model series. V6 outperformed OpenAI’s GPT-4o across several metrics, including fact-checking, numerical reasoning, data analysis and visualisation, according to SenseTime chairman and CEO Xu Li, citing data from benchmarking platform TableBench.

With 600 billion parameters, V6 is China’s leading model in multimodal reasoning and also the most cost-effective option for inference across the industry, according to the company.

Xu also said that V6 Reasoner outperformed OpenAI’s o1 and Google’s Gemini 2.0 Flash Thinking in multimodal reasoning abilities. The advances are designed to address an industry-wide challenge: the depletion of high-quality text data for training large language models (LLMs).

SenseTime’s booth at an AI conference in Shanghai. Photo: Costfoto/NurPhoto via Getty Images

Unlike traditional LLMs that focus primarily on text, multimodal LLMs integrate various modalities – such as images, audio and video – to improve comprehension and generation capabilities.

The industry’s initial strategy of expanding model parameters under the scaling law had “hit a wall”, Xu said in an interview in Shanghai on Thursday. “We’ve nearly exhausted all text data that can be collected from the internet,” he said.

Source link