China’s DeepSeek Unveils AI Model Halving Costs by 50% – The ‘Sparse Attention’ Revolution

China’s DeepSeek Unveils AI Model Halving Costs by 50% – The ‘Sparse Attention’ Revolution

  • New model announced: DeepSeek on Sept. 29 released its experimental LLM DeepSeek-V3.2-Exp, which introduces a novel “DeepSeek Sparse Attention” mechanism focusing computation on key tokens Techcrunch Hindustantimes.
  • Huge cost cuts: The startup says this approach slashes API inference costs by roughly 50% for long-text tasks Techcrunch Euronews. In early tests, the price of a typical call dropped by as much as half when processing very long contexts Techcrunch Reuters.
  • Open source release: V3.2-Exp is fully open-weight and available under an MIT license on developer platforms (Hugging Face/GitHub) Techcrunch Venturebeat, enabling anyone to download or self-host it.
  • How it works: The “sparse attention” uses a “lightning indexer” plus a fine-grained token selector to pick only the most relevant parts of a huge input Techcrunch Venturebeat. This cuts the quadratic compute needed by standard Transformers, preserving output quality while trimming energy and latency Venturebeat Venturebeat.
  • Performance: Reports say V3.2-Exp largely matches its predecessor (V3.1-Terminus) on key benchmarks Venturebeat, while cutting token costs from ~$0.07 to ~$0.028 per million (input cache hits) Venturebeat. However, DeepSeek’s models still rank below top-tier AIs like GPT-5 or Anthropic’s Claude on overall “intelligence” tests Euronews Venturebeat.
  • Strategic context: DeepSeek calls V3.2-Exp “an intermediate step toward our next-generation architecture” Reuters. Notably, the model is built to run on Chinese AI chips (e.g. Huawei Ascend, Cambricon) “right out of the box” Bloomberg Cryptopolitan, aligning with Beijing’s push for homegrown hardware amid U.S. export bans.
  • Expert views: Analysts welcome the cost savings. Futurum Group’s Nick Patience says the model “should make [AI] faster and more cost-effective … without a noticeable drop in performance” Cryptopolitan. But others, like BlankPage Capital’s Ekaterina Almasque, warn that sparse methods “cut out things you think are not important” – with no guarantee the model won’t drop truly relevant data Cryptopolitan.

DeepSeek’s New V3.2-Exp Model

Hangzhou-based DeepSeek burst onto the AI scene earlier in 2025 with its R1 model (a heavily RL-trained chatbot) Techcrunch. This time, DeepSeek’s announcement focuses on efficiency. On Sept. 29 the company published a post (on Hugging Face) unveiling DeepSeek-V3.2-Exp, an experimental large language model built on its V3 series Techcrunch Techcrunch. According to DeepSeek, V3.2-Exp maintains similar reasoning performance to V3.1 but uses far less compute for long inputs. The key innovation is a “DeepSeek Sparse Attention” (DSA) mechanism: rather than comparing every token to every other in a long document (the dense attention used by vanilla Transformers), DSA first uses a “lightning indexer” to pick out important excerpts, then a fine-grained selector to zoom in on the most salient words inside them Techcrunch Hindustantimes. This two-stage pruning means the model can “handle a large amount of data” more cheaply, processing tens of thousands of tokens without exploding costs Techcrunch Venturebeat.

DeepSeek’s announcement on Hugging Face explicitly calls V3.2-Exp an “intermediate step toward our next-generation architecture” Reuters. In practice, it built V3.2-Exp by adding DSA on top of its V3.1-Terminus model (itself a refinement of V3.1) Venturebeat. The company also released the full model weights and code under an open-source license (MIT) on Hugging Face and GitHub Techcrunch Venturebeat, continuing its commitment to transparency. As VentureBeat notes, anyone can now download, modify, or deploy V3.2-Exp without fees Venturebeat. DeepSeek even provides optimized kernels (via LMSYS and vLLM) to run the sparse model across contexts up to 128K tokens Venturebeat.

How “Sparse Attention” Slashes Costs

Transformer models like ChatGPT normally pay a steep price for long texts. Classic self-attention scales quadratically with context length – doubling the text more than doubles the work. As a result, “longer sequences – tens of thousands or even over 100,000 tokens – cause costs to rise much faster than the token count alone would suggest” Venturebeat. Sparse Attention tackles this by effectively ignoring irrelevant content. DeepSeek describes DSA as using a lightning indexer to score chunks of the input, then loading only the most useful tokens into the attention window Techcrunch Venturebeat. In experiments, this “selective attention” cut the compute per token dramatically while preserving almost the same answer quality Venturebeat. As one report explains, “by reducing the compute burden per token at large context lengths, V3.2-Exp keeps the cost curve flatter and much lower” Venturebeat. In practice, this means tasks like summarizing a 100-page document or chatting with full history become far more affordable.

DeepSeek’s new model uses this efficiency not only in inference but also in training and fine-tuning. The company’s published paper (linked on Hugging Face) details the indexer and token-selector design Hindustantimes. In effect, DSA causes the model to “skip irrelevant data,” as Hugging Face’s Adina Yakefu (Chinese community lead) notes, boosting speed and lowering energy use Cryptopolitan Cryptopolitan. Internally, the firm combined these architectural changes with more advanced distillation and reinforcement-learning steps, but the headline is that V3.2-Exp can process very long contexts (up to 128K tokens) without the runaway costs a normal Transformer would incur Venturebeat Venturebeat.

Performance and Cost Savings

Despite its radical new design, V3.2-Exp delivers nearly the same accuracy as its predecessor on standard benchmarks. VentureBeat reports that the model “mostly matches or slightly improves the benchmarks” of V3.1-Terminus Venturebeat. In held-out tests, scores on tasks like reasoning, coding, and Q&A were essentially flat compared to V3.1 Venturebeat. This implies DeepSeek achieved its goal: maintain performance while cutting resource use. (Notably, DeepSeek’s V3 series still trails leading AIs in raw capability; for example, V3.1 ranks behind OpenAI’s GPT-5 and Anthropic’s Claude in recent rankings Euronews.)

The real difference comes in price. DeepSeek publicly slashed its API pricing with V3.2-Exp. Under the new scheme, one million input tokens costs about $0.028 (for cache hits) versus $0.07 before Venturebeat – roughly a 60% cut. (Output tokens are also cheaper.) Reuters notes that DeepSeek’s official announcements claim “API prices by 50%+” reduction Reuters. In long-context applications, internal tests showed typical per-request costs falling by half or more Techcrunch Venturebeat. Industry comparisons list DeepSeek’s API now among the cheapest; only something like OpenAI’s tiny “GPT-5 Nano” (not full GPT-5) is lower per token Venturebeat.

In practical terms, users can now afford to run deep-learning tasks on far longer texts before costs spike out of control Venturebeat Venturebeat. For example, summarizing a 50-page report or maintaining a huge chat history is now “far more practical and affordable” Venturebeat. DeepSeek and venture analysts highlight that this could open powerful AI to smaller developers. As Futurum Group researcher Nick Patience tells CNBC, the innovation should make the model “faster and more cost-effective to use without a noticeable drop in performance” Cryptopolitan, expanding access to those who couldn’t afford pricier models.

China’s AI Push and Strategic Impact

The launch of V3.2-Exp comes amid a heated tech rivalry. China is pushing its firms to break free of foreign chips in AI, and DeepSeek is aligning with this policy. Bloomberg notes the startup said it’s working “with Chinese chipmakers on the model” Bloomberg. Indeed, DeepSeek confirmed V3.2-Exp runs natively on homegrown AI processors (such as Huawei’s Ascend and Cambricon) “right out of the box” Cryptopolitan. This matters because U.S. bans (by both Biden and Trump administrations) have restricted Nvidia’s top AI chips to China Euronews, forcing Chinese tech to rely on domestic semiconductors. By making its model co-design with local hardware, DeepSeek helps Beijing’s goal of AI self-sufficiency.

Strategically, the move also fuels a domestic price war among Chinese AI providers. DeepSeek’s dramatic price cuts (to ~$0.03 per 1K tokens) give it a competitive edge over other local models (e.g. Alibaba’s Qwen series) and even over some global offerings Hindustantimes Venturebeat. Wired-style comparisons note that Chinese companies are keenly watching DeepSeek: its R1 earlier showed Chinese teams could train advanced LLMs cheaply Techcrunch, and now V3.2-Exp may teach even U.S. firms new tricks about efficiency Techcrunch. Authorities in Europe and the U.S. have even barred government use of DeepSeek due to security concerns, underscoring how seriously these models are taken Euronews. DeepSeek’s founder himself seems aware of the geopolitical angle: the blog post repeatedly frames the work as research into “more efficient transformer architectures” – a domain of intense global competition Euronews.

Importantly, DeepSeek is not alone in exploring sparse techniques. Even OpenAI experimented with sparse attention years ago Hindustantimes. But by shipping an open-source implementation at scale, DeepSeek ensures the community (and rivals) will test and improve on it. As one analyst puts it, “people will always go for what is cheap, reliable, and effective,” and DeepSeek seems determined to be that option Cryptopolitan. Huawei Cloud quickly announced it had already “completed the adaptation” of V3.2-Exp to its services Hindustantimes, signaling broad industry uptake.

Expert Perspectives and Outlook

Most experts applaud the reduced costs but urge caution. As Futurum’s Patience notes, cheaper inference “opens up powerful AI tools to developers who can’t afford more expensive models” Cryptopolitan. That democratization is attractive, but the flip side is risk. BlankPage’s Ekaterina Almasque warns that sparse attention “cuts out things you think are not important,” and there’s no guarantee it isn’t accidentally dropping really important details Cryptopolitan. In other words, efficiency gains may come at a cost in nuance. Early reports from third-party evaluations will be crucial to verify DeepSeek’s claims.

Some see V3.2-Exp as a tactical move. DeepSeek itself calls it “an intermediate step” Reuters. Cryptopolitan notes the company is “playing the long game” by continuing to feed the open-source community Cryptopolitan. Investors and users will watch for what comes next – perhaps a V3.3 or V4 that combines this cost-cutting with a capability boost. For now, DeepSeek-V3.2-Exp stands as a symbol of the shifting AI arms race: it shows that beyond raw power, efficiency and cost matter hugely. As one tech editor put it, even if V3.2-Exp doesn’t dethrone GPT-5, it might “teach U.S. providers some much needed tricks” for cheaper AI services Techcrunch.

Sources: DeepSeek’s own Hugging Face post and research paper Techcrunch Hindustantimes; reporting by TechCrunch, Reuters, Bloomberg, Euronews, WSJ/Hindustan Times, and VentureBeat Reuters Euronews Venturebeat Cryptopolitan; expert comments from CNBC and Cryptopolitan coverage Cryptopolitan Cryptopolitan. These sources detail the model’s design, claimed cost-savings, and industry reactions.

Stock Market Today

  • Pfizer Q4 Preview: Non-Oncology Segment Sales Outlook and Market Impact
    January 19, 2026, 11:10 AM EST. Pfizer (PFE) is scheduled to report Q4 and full-year 2025 earnings on Feb. 3. While oncology sales account for over 28% of revenues, investor focus also shifts to non-oncology segments including Primary Care and Specialty Care. Key non-oncology products include Eliquis anticoagulant, Prevnar vaccines, COVID-19 vaccine Comirnaty, and newer drugs like RSV vaccine Abrysvo and migraine treatment Nurtec. Eliquis alliance revenues are expected to rise to $2.1 billion despite generic pressures. Prevnar vaccine sales likely declined in the U.S. but increased internationally, estimated at $1.64 billion. COVID-19 medication sales fell due to narrower vaccine recommendations and lower infections. In Specialty Care, Vyndaqel sales remain strong at an estimated $1.66 billion, while Xeljanz and Enbrel may have weakened. Pfizer's stock dropped 2.5% in past year, underperforming the industry, but trades at a forward price/earnings ratio of 8.58, suggesting valuation appeal.
Inside Salesforce’s Generative AI Revolution: How Marketing GPT and Einstein GPT Are Reshaping CRM
Previous Story

Salesforce (CRM) Stock News: Shares Slide 3% on Weak Guidance, AI Ambitions Tested

AI Models Are Scheming – Inside OpenAI’s Plan to Stop Deceptive AI Behavior
Next Story

OpenAI’s New Sora App Turns You and Friends into AI-Generated Movie Stars

Go toTop