arXiv
Scaling Embeddings Outperforms Scaling Experts in Language Models
Hong Liu, Jiaqi Zhang, Chao Wang
Jan 30, 2026·07:37·
00:0007:37
Embedding ScalingMixture-of-Experts (MoE)N-gram EmbeddingsLongCat-Flash-LiteSpeculative DecodingN-gram Embedding Layer
About This Paper
Turn any paper into a podcast
ResearchPod turns research papers into podcasts you can actually follow.
Download on the App Store