A paper – published last week by Huawei’s Pangu team, which comprises 22 core contributors and 56 additional researchers – introduced the concept of Mixture of Grouped Experts (MoGE). It is an upgraded version of the Mixture of Experts (MoE) technique that has been instrumental in DeepSeek’s cost-effective AI models.
While MoE offers low execution costs for large model parameters and enhanced learning capacity, it often results in inefficiencies, according to the paper. This is because of the uneven activation of so-called experts, which can hinder performance when running on multiple devices in parallel.
In contrast, the improved MoGE “groups the experts during selection and better balances the expert workload”, researchers said.
In AI training, “experts” refer to specialised sub-models or components within a larger model, each designed to handle specific tasks or types of data. This allows the overall system to take advantage of diverse expertise to enhance performance.