Las Vegas, Jan 5, 2026, 15:39 (PST)
Nvidia CEO Jensen Huang said at CES in Las Vegas on Monday that the company’s next-generation Vera Rubin AI platform is in full production and can deliver five times the artificial-intelligence computing of its previous chips when running chatbots and other applications. He said Rubin uses a proprietary kind of data to reach that gain, adding: “This is how we were able to deliver such a gigantic step up in performance.” The push comes as rivals such as Advanced Micro Devices and in-house chips from customers like Alphabet’s Google compete more aggressively for the market of running trained AI models at scale. 1
Rubin is Nvidia’s next major data-center platform after Blackwell, and the CES launch is an early marker for cloud and enterprise buyers lining up 2026 capacity. Demand is shifting from training large models to inference — the stage where companies deploy AI to answer user queries in real time — and that is where latency and cost per response become central. 2
Nvidia has said Rubin-based products will be available through partners in the second half of 2026, giving customers a timetable as they plan data-center builds. Much of the attention around the launch has centered on Nvidia’s pledge to lower cost per token, the chunks of text AI systems generate and consume. 3
Nvidia said the Rubin platform combines six chips — the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 networking chip, BlueField-4 data processor and a Spectrum-6 Ethernet switch — designed as a single system rather than standalone parts. Its flagship NVL72 configuration links 72 graphics processing units, or GPUs, with 36 central processing units, or CPUs, in a rack-scale server. Nvidia said Rubin can cut inference token costs by up to 10 times and train mixture-of-experts models — systems that route tasks to specialized sub-models — with four times fewer GPUs than its Blackwell platform. 4
In a technical briefing, Nvidia described Rubin as an “AI factory” design that treats the whole rack, not a single server, as the unit of computing to keep performance steady when systems are fully loaded. The company said it is building end-to-end encryption and other security features into the rack to protect proprietary data used for training and inference. 5
Nvidia also unveiled a BlueField-4-based storage platform aimed at “context memory” — the key-value cache that helps chatbots keep track of long conversations. The company said the system can boost tokens-per-second throughput and power efficiency by up to five times versus traditional storage by sharing that context across clusters of AI servers. 6
The company said its DGX SuperPOD reference architecture will serve as a blueprint for deploying Rubin-based systems across enterprise and research customers. Nvidia said DGX Rubin systems are designed to reduce the cost of inference token generation while supporting long-context reasoning workloads. 7
In autonomous vehicles, Nvidia said it is releasing the Alpamayo family of open AI models, simulation tools and datasets to tackle rare “long-tail” driving scenarios that are hard to cover with standard training data. The company said the package is aimed at helping developers build reasoning-based systems that can be tested in simulation before road deployment. 8
Mercedes-Benz said it will launch MB.DRIVE ASSIST PRO in the United States later this year, letting vehicles drive on city streets under driver supervision and challenging Tesla’s Full Self-Driving feature set. Mercedes put the price at $3,950 for three years and said the system uses about 30 sensors feeding a computer capable of 508 trillion operations per second. Nvidia said the new Mercedes-Benz CLA will use its DRIVE AV software and support over-the-air updates. 9
The next test for Rubin will be whether customers adopt Nvidia’s proprietary data approach and pay for tightly integrated racks, rather than leaning further into in-house chips or cheaper alternatives. In cars, the technology is still constrained by the requirement that drivers stay alert and ready to intervene, limiting how quickly city-street automation becomes a mass-market feature.