arXiv

Scaling Embeddings Outperforms Scaling Experts in Language Models

Hong Liu, Jiaqi Zhang, Chao Wang
Jan 30, 2026·07:37·
00:00
07:37
Embedding ScalingMixture-of-Experts (MoE)N-gram EmbeddingsLongCat-Flash-LiteSpeculative DecodingN-gram Embedding Layer

About This Paper

Turn any paper into a podcast

ResearchPod turns research papers into podcasts you can actually follow.

Download on the App Store