Browse Tag

mixture-of-memories

Mixture-of-Memories (MoM): The “Linear Attention” Breakthrough That Doesn’t Forget Long Contexts

Mixture-of-Memories (MoM): The “Linear Attention” Breakthrough That Doesn’t Forget Long Contexts

MoM was released in May 2025 as a linear-attention sequence model that preserves long-term context without forgetting. MoM breaks the single-memory bottleneck by maintaining multiple independent memory states and using a token router to assign information to memory slots. A trained router assigns each token to the top-k most relevant memory modules, with typical settings like top-k=2. Activated memories are updated by adding the token’s key and value outer-product to the memory, while a shared global memory is updated concurrently. MoM achieves linear time complexity per token and linear scaling with sequence length, unlike Transformers which are quadratic. In benchmarks,
Go toTop