Today: 12 April 2026
China’s DeepSeek Unveils AI Model Halving Costs by 50% – The ‘Sparse Attention’ Revolution
30 September 2025
7 mins read

China’s DeepSeek Unveils AI Model Halving Costs by 50% – The ‘Sparse Attention’ Revolution

  • New model announced: DeepSeek on Sept. 29 released its experimental LLM DeepSeek-V3.2-Exp, which introduces a novel “DeepSeek Sparse Attention” mechanism focusing computation on key tokens techcrunch.com hindustantimes.com.
  • Huge cost cuts: The startup says this approach slashes API inference costs by roughly 50% for long-text tasks . In early tests, the price of a typical call dropped by as much as half when processing very long contexts .
  • Open source release: V3.2-Exp is fully open-weight and available under an MIT license on developer platforms (Hugging Face/GitHub) , enabling anyone to download or self-host it.
  • How it works: The “sparse attention” uses a “lightning indexer” plus a fine-grained token selector to pick only the most relevant parts of a huge input techcrunch.com venturebeat.com. This cuts the quadratic compute needed by standard Transformers, preserving output quality while trimming energy and latency venturebeat.com venturebeat.com.
  • Performance: Reports say V3.2-Exp largely matches its predecessor (V3.1-Terminus) on key benchmarks venturebeat.com, while cutting token costs from ~$0.07 to ~$0.028 per million (input cache hits) venturebeat.com. However, DeepSeek’s models still rank below top-tier AIs like GPT-5 or Anthropic’s Claude on overall “intelligence” tests euronews.com venturebeat.com.
  • Strategic context: DeepSeek calls V3.2-Exp “an intermediate step toward our next-generation architecture” reuters.com. Notably, the model is built to run on Chinese AI chips (e.g. Huawei Ascend, Cambricon) “right out of the box” bloomberg.com cryptopolitan.com, aligning with Beijing’s push for homegrown hardware amid U.S. export bans.
  • Expert views: Analysts welcome the cost savings. Futurum Group’s Nick Patience says the model “should make [AI] faster and more cost-effective … without a noticeable drop in performance” cryptopolitan.com. But others, like BlankPage Capital’s Ekaterina Almasque, warn that sparse methods “cut out things you think are not important” – with no guarantee the model won’t drop truly relevant data cryptopolitan.com.

DeepSeek’s New V3.2-Exp Model

Hangzhou-based DeepSeek burst onto the AI scene earlier in 2025 with its R1 model (a heavily RL-trained chatbot) techcrunch.com. This time, DeepSeek’s announcement focuses on efficiency. On Sept. 29 the company published a post (on Hugging Face) unveiling DeepSeek-V3.2-Exp, an experimental large language model built on its V3 series techcrunch.com techcrunch.com. According to DeepSeek, V3.2-Exp maintains similar reasoning performance to V3.1 but uses far less compute for long inputs. The key innovation is a “DeepSeek Sparse Attention” (DSA) mechanism: rather than comparing every token to every other in a long document (the dense attention used by vanilla Transformers), DSA first uses a “lightning indexer” to pick out important excerpts, then a fine-grained selector to zoom in on the most salient words inside them techcrunch.com hindustantimes.com. This two-stage pruning means the model can “handle a large amount of data” more cheaply, processing tens of thousands of tokens without exploding costs techcrunch.com venturebeat.com.

DeepSeek’s announcement on Hugging Face explicitly calls V3.2-Exp an “intermediate step toward our next-generation architecture” reuters.com. In practice, it built V3.2-Exp by adding DSA on top of its V3.1-Terminus model (itself a refinement of V3.1) venturebeat.com. The company also released the full model weights and code under an open-source license (MIT) on Hugging Face and GitHub techcrunch.com venturebeat.com, continuing its commitment to transparency. As VentureBeat notes, anyone can now download, modify, or deploy V3.2-Exp without fees venturebeat.com. DeepSeek even provides optimized kernels (via LMSYS and vLLM) to run the sparse model across contexts up to 128K tokens venturebeat.com.

How “Sparse Attention” Slashes Costs

Transformer models like ChatGPT normally pay a steep price for long texts. Classic self-attention scales quadratically with context length – doubling the text more than doubles the work. As a result, “longer sequences – tens of thousands or even over 100,000 tokens – cause costs to rise much faster than the token count alone would suggest” venturebeat.com. Sparse Attention tackles this by effectively ignoring irrelevant content. DeepSeek describes DSA as using a lightning indexer to score chunks of the input, then loading only the most useful tokens into the attention window techcrunch.com venturebeat.com. In experiments, this “selective attention” cut the compute per token dramatically while preserving almost the same answer quality venturebeat.com. As one report explains, “by reducing the compute burden per token at large context lengths, V3.2-Exp keeps the cost curve flatter and much lower” venturebeat.com. In practice, this means tasks like summarizing a 100-page document or chatting with full history become far more affordable.

DeepSeek’s new model uses this efficiency not only in inference but also in training and fine-tuning. The company’s published paper (linked on Hugging Face) details the indexer and token-selector design hindustantimes.com. In effect, DSA causes the model to “skip irrelevant data,” as Hugging Face’s Adina Yakefu (Chinese community lead) notes, boosting speed and lowering energy use cryptopolitan.com cryptopolitan.com. Internally, the firm combined these architectural changes with more advanced distillation and reinforcement-learning steps, but the headline is that V3.2-Exp can process very long contexts (up to 128K tokens) without the runaway costs a normal Transformer would incur venturebeat.com venturebeat.com.

Performance and Cost Savings

Despite its radical new design, V3.2-Exp delivers nearly the same accuracy as its predecessor on standard benchmarks. VentureBeat reports that the model “mostly matches or slightly improves the benchmarks” of V3.1-Terminus venturebeat.com. In held-out tests, scores on tasks like reasoning, coding, and Q&A were essentially flat compared to V3.1 venturebeat.com. This implies DeepSeek achieved its goal: maintain performance while cutting resource use. (Notably, DeepSeek’s V3 series still trails leading AIs in raw capability; for example, V3.1 ranks behind OpenAI’s GPT-5 and Anthropic’s Claude in recent rankings euronews.com.)

The real difference comes in price. DeepSeek publicly slashed its API pricing with V3.2-Exp. Under the new scheme, one million input tokens costs about $0.028 (for cache hits) versus $0.07 before venturebeat.com – roughly a 60% cut. (Output tokens are also cheaper.) Reuters notes that DeepSeek’s official announcements claim “API prices by 50%+” reduction reuters.com. In long-context applications, internal tests showed typical per-request costs falling by half or more techcrunch.com venturebeat.com. Industry comparisons list DeepSeek’s API now among the cheapest; only something like OpenAI’s tiny “GPT-5 Nano” (not full GPT-5) is lower per token venturebeat.com.

In practical terms, users can now afford to run deep-learning tasks on far longer texts before costs spike out of control venturebeat.com venturebeat.com. For example, summarizing a 50-page report or maintaining a huge chat history is now “far more practical and affordable” venturebeat.com. DeepSeek and venture analysts highlight that this could open powerful AI to smaller developers. As Futurum Group researcher Nick Patience tells CNBC, the innovation should make the model “faster and more cost-effective to use without a noticeable drop in performance” cryptopolitan.com, expanding access to those who couldn’t afford pricier models.

China’s AI Push and Strategic Impact

The launch of V3.2-Exp comes amid a heated tech rivalry. China is pushing its firms to break free of foreign chips in AI, and DeepSeek is aligning with this policy. Bloomberg notes the startup said it’s working “with Chinese chipmakers on the model” bloomberg.com. Indeed, DeepSeek confirmed V3.2-Exp runs natively on homegrown AI processors (such as Huawei’s Ascend and Cambricon) “right out of the box” cryptopolitan.com. This matters because U.S. bans (by both Biden and Trump administrations) have restricted Nvidia’s top AI chips to China euronews.com, forcing Chinese tech to rely on domestic semiconductors. By making its model co-design with local hardware, DeepSeek helps Beijing’s goal of AI self-sufficiency.

Strategically, the move also fuels a domestic price war among Chinese AI providers. DeepSeek’s dramatic price cuts (to ~$0.03 per 1K tokens) give it a competitive edge over other local models (e.g. Alibaba’s Qwen series) and even over some global offerings hindustantimes.com venturebeat.com. Wired-style comparisons note that Chinese companies are keenly watching DeepSeek: its R1 earlier showed Chinese teams could train advanced LLMs cheaply techcrunch.com, and now V3.2-Exp may teach even U.S. firms new tricks about efficiency techcrunch.com. Authorities in Europe and the U.S. have even barred government use of DeepSeek due to security concerns, underscoring how seriously these models are taken euronews.com. DeepSeek’s founder himself seems aware of the geopolitical angle: the blog post repeatedly frames the work as research into “more efficient transformer architectures” – a domain of intense global competition euronews.com.

Importantly, DeepSeek is not alone in exploring sparse techniques. Even OpenAI experimented with sparse attention years ago hindustantimes.com. But by shipping an open-source implementation at scale, DeepSeek ensures the community (and rivals) will test and improve on it. As one analyst puts it, “people will always go for what is cheap, reliable, and effective,” and DeepSeek seems determined to be that option cryptopolitan.com. Huawei Cloud quickly announced it had already “completed the adaptation” of V3.2-Exp to its services hindustantimes.com, signaling broad industry uptake.

Expert Perspectives and Outlook

Most experts applaud the reduced costs but urge caution. As Futurum’s Patience notes, cheaper inference “opens up powerful AI tools to developers who can’t afford more expensive models” cryptopolitan.com. That democratization is attractive, but the flip side is risk. BlankPage’s Ekaterina Almasque warns that sparse attention “cuts out things you think are not important,” and there’s no guarantee it isn’t accidentally dropping really important details cryptopolitan.com. In other words, efficiency gains may come at a cost in nuance. Early reports from third-party evaluations will be crucial to verify DeepSeek’s claims.

Some see V3.2-Exp as a tactical move. DeepSeek itself calls it “an intermediate step” reuters.com. Cryptopolitan notes the company is “playing the long game” by continuing to feed the open-source community cryptopolitan.com. Investors and users will watch for what comes next – perhaps a V3.3 or V4 that combines this cost-cutting with a capability boost. For now, DeepSeek-V3.2-Exp stands as a symbol of the shifting AI arms race: it shows that beyond raw power, efficiency and cost matter hugely. As one tech editor put it, even if V3.2-Exp doesn’t dethrone GPT-5, it might “teach U.S. providers some much needed tricks” for cheaper AI services techcrunch.com.

Sources: DeepSeek’s own Hugging Face post and research paper ; reporting by TechCrunch, Reuters, Bloomberg, Euronews, WSJ/Hindustan Times, and VentureBeat ; expert comments from CNBC and Cryptopolitan coverage . These sources detail the model’s design, claimed cost-savings, and industry reactions.

Stock Market Today

  • Nvidia Stock Stagnates in 2026 Despite 73% Revenue Surge from AI Demand
    April 12, 2026, 1:30 PM EDT. Nvidia's stock has declined about 5% in 2026 and stagnated since August 2025, despite a 73% revenue increase from artificial intelligence (AI) products in the latest quarter. The company forecasts a 77% revenue rise for the first quarter of 2026 with analyst consensus of an 85% increase in the second quarter. AI hyperscaler customers, who are major buyers of Nvidia's technology, continue to boost their capital expenditures, suggesting further growth. Nvidia's shares trade at 21.5 times forward earnings, slightly above the S&P 500's 20.3, reflecting expectations of strong 2026 results followed by market-average performance in 2027.

Latest article

Bitcoin Price Today Slips After Iran Talks End Without Deal, but ETF Buyers Keep Showing Up

Bitcoin Price Today Slips After Iran Talks End Without Deal, but ETF Buyers Keep Showing Up

12 April 2026
Bitcoin fell 1.4% to $71,707 on Sunday after U.S.-Iran talks in Islamabad ended without a deal. Spot bitcoin ETFs logged net inflows last week, with BlackRock and Fidelity leading Friday’s buying. Morgan Stanley launched its MSBT fund on April 8, the first Wall Street bank to debut a bitcoin ETF. U.S. inflation data showed headline CPI up 3.3% in March, while core CPI rose 2.6%.
XRP Price Today: XRP Slips to $1.33 After Failed U.S.-Iran Talks Hit Crypto

XRP Price Today: XRP Slips to $1.33 After Failed U.S.-Iran Talks Hit Crypto

12 April 2026
XRP slipped about 1% to $1.33 on Sunday after U.S.-Iran peace talks in Islamabad ended without a deal, pressuring crypto markets. The token traded in a narrow range, with bitcoin and ether also weaker. XRP’s market cap stands at $81.7 billion, with $1.96 billion in daily volume. The token remains 63.5% below its all-time high.
Gold Price Today: Bullion Near $4,762 After Weekly Gain, but Failed Iran Talks Cloud Outlook

Gold Price Today: Bullion Near $4,762 After Weekly Gain, but Failed Iran Talks Cloud Outlook

12 April 2026
Spot gold steadied at $4,761.79 an ounce Friday after a third weekly gain, with U.S. futures at $4,787.40. The dollar posted its biggest weekly drop since January, making gold cheaper for non-U.S. buyers. U.S.-Iran talks ended without a deal, keeping geopolitical risks high. China’s central bank increased gold reserves for a 17th month, reaching 74.38 million ounces.
Wall Street Feels the Heat (and Thrill): Fed Cuts, Tariffs & Mega-Mergers Set NYSE Buzz

US Stock Market Today: Live Updates 12.04.2026

12 April 2026
Futu Holdings (FUTU) rose 10.2% in the past week but trades 13.4% below its January level. Shares closed at $154.50, while analysts estimate intrinsic value at $245.48. The company posted a 92.2% return over 12 months. Valuation models indicate earnings exceed risk costs, supporting long-term growth projections.
India F-35 Deal Hits Pause: Lockheed Martin Says No Direct Talks, U.S. Door Still Open

India F-35 Deal Hits Pause: Lockheed Martin Says No Direct Talks, U.S. Door Still Open

11 April 2026
Lockheed Martin said it is not in direct talks with India over the F-35, clarifying that any approach must go through official U.S. and Indian channels under the Foreign Military Sales process. Indian officials confirmed no formal discussions on acquiring the F-35 have begun. India recently approved a $40 billion military upgrade, including other fighter jets, while Lockheed’s F-21 remains in a separate competition.
Trump Unveils ‘TrumpRx’ Plan with Pfizer Deal to Slash Drug Prices
Previous Story

Trump Unveils ‘TrumpRx’ Plan with Pfizer Deal to Slash Drug Prices

Capital One Savers Cheated Out of Billions – $425M Settlement Payout Deadline Fast Approaching
Next Story

Capital One Savers Cheated Out of Billions – $425M Settlement Payout Deadline Fast Approaching

Go toTop