LIM Center, Aleje Jerozolimskie 65/79, 00-697 Warsaw, Poland
+48 (22) 364 58 00

The “HTML of the AI Era”? Microsoft’s NLWeb Aims to Put ChatGPT on Every Website

The “HTML of the AI Era”? Microsoft’s NLWeb Aims to Put ChatGPT on Every Website

The “HTML of the AI Era”? Microsoft’s NLWeb Aims to Put ChatGPT on Every Website

A New Vision for a Conversational Web

Imagine every website you visit could talk back – answering your questions in natural language, like a built-in ChatGPT trained on that site’s content. That’s the bold idea behind NLWeb, an open protocol unveiled by Microsoft at its Build 2025 conference. NLWeb (short for Natural Language Web) promises to make the internet “conversational” by transforming ordinary websites into AI-powered chat interfaces news.microsoft.com theverge.com. With just a few lines of code, any site or app can deploy a custom chatbot that understands free-form questions and responds with relevant answers drawn from the site’s own data – essentially offering ChatGPT-level search on any website theverge.com theverge.com. Microsoft’s CTO Kevin Scott even likens the underlying technology to the web’s next big paradigm: “kind of like HTTP for agents,” a foundational layer for an emerging “agentic web” of AI applications infoworld.com simform.com. Some are calling NLWeb the “HTML of the AI era,” suggesting it could play a role as pivotal for AI-driven interactions as HTML did for the early World Wide Web news.microsoft.com greyhoundresearch.com.

What Is NLWeb and Why It Matters

NLWeb is an open-source project (available on GitHub) designed to simplify the creation of natural language interfaces for websites news.microsoft.com news.microsoft.com. In essence, NLWeb provides a standard protocol for asking a website questions in plain language and getting answers back in a structured format theverge.com. “It’s a protocol… a way of asking a natural-language question, and the answer comes back in structured form,” explains Ramanathan V. Guha, the Microsoft technical fellow who conceived NLWeb theverge.com. Guha – a veteran technologist known for co-creating RSS feeds and Schema.org standards – sees this as part of a new revolution in computing: one where we “communicate with applications… with free-form language” instead of clicking links and menus theverge.com theverge.com.

The motivation for NLWeb arises from a pressing concern in the AI era: who controls information access on the web. Currently, AI chatbots like ChatGPT or Bing Chat often scrape content from countless sites, then answer users’ questions without necessarily driving traffic back to those source websites. Guha warns that “too much of that new communication… is mediated by products like ChatGPT… which take all their knowledge and return no value” to the original content creators theverge.com. NLWeb offers an alternative future where website owners themselves deploy the conversational AI. Instead of hoping an external bot cites them, sites can run their own chatbot that directly engages users theverge.com theverge.com. “They don’t have to rely on some external AI product… They can run the bot themselves,” Guha emphasizes theverge.com. In Microsoft’s vision, this shift would “bring the benefits of AI that have transformed how people search directly to the websites themselves,” much as HTML once empowered anyone to publish information online news.microsoft.com.

How NLWeb Works: Turning Websites into Chatbots

At a technical level, NLWeb is designed to work alongside a website, indexing its content and exposing it to AI models in a structured way. It leverages the web’s existing semi-structured data – formats like HTML, RSS feeds, and Schema.org metadata – as the foundation for understanding a site’s content news.microsoft.com infoworld.com. Many modern websites already annotate pages with Schema.org tags (for SEO and search engine indexing), describing things like recipes, events, products, reviews, and more in a machine-readable form. NLWeb takes advantage of these embedded semantics: by wrapping content in standard Schema.org definitions or microdata, developers make their pages easily parsable by NLWeb’s AI infoworld.com infoworld.com. According to Microsoft, the best results with NLWeb come from sites structured as “lists of items” – think of a recipe site, an e-commerce catalog, or a travel listing, where each entry has clear attributes like name, description, price, location, etc., often already marked up with Schema.org microdata infoworld.com infoworld.com.

Once the site’s data is organized, NLWeb uses large language models (LLMs) to turn natural-language queries into answers. Any LLM or AI model can be used – NLWeb is completely model-agnostic and supports both proprietary and open-source models, whether hosted in the cloud or running locally news.microsoft.com greyhoundresearch.com. Developers also choose a vector database of their liking, where the site’s content (e.g. text from pages or feed data) is stored as embeddings for the model to retrieve relevant information news.microsoft.com. When a user (or an AI agent) asks a question via NLWeb, here’s what happens behind the scenes:

  • The NLWeb service “understands” the query by using an LLM to interpret user intent and possible relevant concepts (for example, knowing that “spicy and crunchy appetizer” implies a type of recipe) theverge.com.
  • It searches the site’s indexed knowledge (in the vector database) for content that matches the query. Because the site’s data is structured and enriched with metadata, the system can more accurately find what the user wants (e.g. identifying all recipes that are spicy, crunchy, and suitable as appetizers) theverge.com theverge.com.
  • It returns results or answers in a structured format via the NLWeb protocol. Essentially, NLWeb defines a standard API (using JSON over REST) with a single core method called ask – you send a question and get back a response containing the answer or relevant links/information infoworld.com. The answer can include references to the site’s pages (for example, links to specific recipes or products that fulfill the query), often accompanied by brief summaries or data drawn from the site content.

In practice, this means the user experiences a conversational search box on the website, rather than a keyword-based search. Guha demonstrated how it works on a recipe site (Serious Eats): when asked “give me spicy and crunchy stuff I can use as an appetizer,” the NLWeb-powered search was able to interpret this creative request and suggest relevant recipe links theverge.com. When Guha added more context – saying it’s for Diwali and that he’s vegetarian – the system remembered those details and refined all further answers to only show vegetarian Diwali-appropriate dishes theverge.com. In another demo on an outdoor retail site, Guha asked for a “jacket warm enough to wear in Quebec,” and the NLWeb interface brought in external knowledge (e.g. understanding Quebec’s climate) to surface suitable jackets with images and product info theverge.com theverge.com. “It not only brings in knowledge about the weather,” Guha notes, “it allows them to render it in a UI that is conducive to things like sponsored links.” theverge.com theverge.com In other words, the answers can be presented in a user-friendly way on the site (with pictures, links, ads, etc.), not just raw text, preserving the website’s ability to engage and monetize users.

Crucially, NLWeb isn’t building a global web-wide index (the way Google Search or Bing does). Instead, each site runs its own NLWeb instance focused on that site’s data. “In order to have a web search index you need to crawl the web… that’s expensive,” Guha explains, both for search providers and websites that get scraped theverge.com theverge.com. By contrast, NLWeb’s approach is lightweight: “I just take an RSS feed, put it in a vector database, and it runs off that,” says Guha theverge.com. The site owner only needs to supply their content (which they already have) and choose an affordable model to power the Q&A. Guha’s demo, for instance, used an open small-scale model he calls GPT-4o Mini and still produced useful results theverge.com theverge.com. Because the heavy lifting of understanding language is done by the model (which already has general world knowledge), the website doesn’t need gigantic bespoke training – it just needs to feed in its specific data. This makes deployment much cheaper and faster than building a traditional search engine or training a custom AI from scratch theverge.com. As Guha puts it, “the whole thing is fast and easy… It allows for the remixing and back-and-forth, at an incredibly low price.” theverge.com.

NLWeb and the Model Context Protocol (MCP)

Every NLWeb instance is more than just a chatbot – it also functions as a server for the Model Context Protocol (MCP), a rising standard in the AI world news.microsoft.com infoworld.com. MCP is an open protocol (initially introduced by AI startup Anthropic) designed to allow AI agents and applications to share context and state with each other in a standardized way theverge.com greyhoundresearch.com. Microsoft has been a big proponent of MCP, embedding support for it across Windows 11, Azure AI services, GitHub, and more, with Kevin Scott calling MCP “the equivalent of HTTP for interconnected AI applications” infoworld.com simform.com. In simpler terms, if the new generation of AI agents (autonomous bots that perform tasks for users) are going to work together and interact with various tools, they need a common language to maintain memory and context – MCP is emerging as that language.

By implementing MCP, NLWeb makes a website’s content and capabilities discoverable to AI agents in a controlled fashion news.microsoft.com. Instead of web crawlers blindly scraping data, an AI agent could directly query a site’s NLWeb interface to get information or even perform actions (with permission). For example, an AI travel assistant agent could query a hotel website via NLWeb to find available rooms and prices, or a personal shopping agent could ask a clothing site about winter jackets, all through natural language requests. Because NLWeb returns structured data, agents can reliably parse and use the answers theverge.com. NLWeb essentially “exposes intent and context naturally” on websites, making them “usable by agents” and not just by human eyeballs greyhoundresearch.com greyhoundresearch.com. Microsoft positions NLWeb as a key interface layer for the “open agentic web” – if MCP is the logic and memory for AI agents, NLWeb is the user interface that agents (and humans) use to interact with the web’s content and services greyhoundresearch.com greyhoundresearch.com. In the long run, this could enable a web where your AI assistant can seamlessly navigate and transact on websites on your behalf, because those sites speak a language the AI can understand (instead of today’s brittle approach of bots pretending to be browsers clicking buttons).

An “HTML Moment” for AI: NLWeb’s Significance

Tech leaders and analysts are framing NLWeb in historic terms. Just as HTML (HyperText Markup Language) in the 1990s provided a universal format to create and link documents on the web, NLWeb could become the de facto standard to make websites conversational and AI-friendly news.microsoft.com greyhoundresearch.com. “Ultimately, we believe NLWeb can play a similar role to HTML in the emerging agentic web,” Microsoft stated at launch news.microsoft.com. The idea is that NLWeb gives the web a “missing abstraction layer for AI” – a kind of grammar or interface that any AI can use to understand what a webpage is about and how to interact with it greyhoundresearch.com greyhoundresearch.com. In the words of one industry analysis, “NLWeb isn’t another tool. It’s a grammar for the agentic age.” greyhoundresearch.com greyhoundresearch.com

This represents a shift from today’s paradigm of bolting on chatbots or exposing rigid APIs. Instead of every company having to build their own chatbot interface or teach an AI about their website via custom APIs, NLWeb standardizes the process. Websites become “structured, discoverable, and secure conversational surfaces at the protocol level”, not just visually formatted pages greyhoundresearch.com greyhoundresearch.com. For users, this could mean a more seamless experience: rather than searching within a site via keywords or navigating menus, you can simply ask the site for what you need (in your own words). For businesses and developers, it lowers the barrier to adopting AI on their sites. “59% of digital leaders… are now prioritizing ‘agent-ready architecture’… making current [apps] discoverable, traversable, and responsive to AI-native experiences”, according to a Greyhound Research survey greyhoundresearch.com greyhoundresearch.com. There’s a growing recognition that future websites should be legible not just to humans but to AI agents. “We don’t want ten new copilots layered onto our site. We want our site to be legible to the agents our customers already use,” said one e-commerce executive, noting that NLWeb “gives us a way to speak their language – without rewriting the whole frontend.” greyhoundresearch.com greyhoundresearch.com In other words, NLWeb aims to let websites talk to AI in a native tongue, rather than forcing AI to figure out each website’s unique interface.

Another major selling point is interoperability and openness. By being an open protocol (free and not tied to a single vendor), NLWeb hopes to gain broad adoption across the industry. “The agentic web shouldn’t be vendor-controlled or locked behind opaque APIs. It should be open, inspectable, and future-compatible,” argues the Greyhound Research analysis, echoing Microsoft’s intent greyhoundresearch.com greyhoundresearch.com. Microsoft’s strategy here is interesting: by giving away this technology openly, they seed a new ecosystem of conversational websites that, down the line, can still benefit Microsoft’s business (for example, many of those sites might choose Microsoft’s Azure services or AI models to power their NLWeb implementations theverge.com). But fundamentally, if NLWeb works as envisioned, it “is much bigger than Microsoft” and would belong to the community of web publishers and users theverge.com.

Who’s Behind NLWeb – and Who’s On Board

NLWeb’s creator R.V. Guha brings a formidable pedigree in web standards. He helped build some of the web’s most important open technologies: RSS, which made content syndication simple, and Schema.org, which gave structure to web data and helped search engines make sense of pages theverge.com theverge.com. Guha joined Microsoft as a Technical Fellow and Corporate VP specifically to pursue this project, underscoring how strategic NLWeb is for the company. His involvement lends NLWeb credibility as an open web initiative (despite coming from Microsoft), since his career has been about making the web more accessible and structured for everyone theverge.com theverge.com. “Over the course of his career, he… has done as much for the open web as just about anyone,” notes The Verge, referring to Guha’s role in shaping RSS and Schema.org into de facto standards theverge.com theverge.com.

Microsoft announced that a cohort of early adopters had been working with NLWeb ahead of its public debut news.microsoft.com. These include content publishers, e-commerce players, and software firms, indicating a wide range of use cases. Some names revealed were: Tripadvisor, Eventbrite, Shopify, Hearst (Delish), Chicago Public Media, O’Reilly Media, Allrecipes/Serious Eats (DDM), and tech companies like Snowflake, Milvus, Qdrant (vector database providers), among others news.microsoft.com simform.com. For example, Shopify – a major e-commerce platform – has integrated NLWeb to make its online stores’ content queryable by AI agents simform.com. Eventbrite can enable conversational queries over event listings, and Tripadvisor could let you ask complex travel questions and get answers sourced from its vast user-generated content. Notably, Snowflake (a cloud data company) and open-source vector DBs like Milvus and Qdrant are involved, showing that the data infrastructure side is also aligned with NLWeb news.microsoft.com news.microsoft.com. Microsoft has also open-sourced the NLWeb project, providing a GitHub repository with the core code, connectors for various AI models and databases, and tools to help publishers prepare their data (e.g. convert feeds or JSON to the needed format) news.microsoft.com. In short, it’s ready for developers to experiment with today.

Benefits for Websites and Users

If widely adopted, NLWeb could dramatically lower the cost and expertise needed for a website to offer an intelligent chatbot or advanced natural-language search. Traditionally, adding an AI assistant to a website meant either integrating a third-party bot (which might not be tuned to the site’s content) or developing a custom solution – both options can be expensive and complex. NLWeb’s approach, by contrast, is plug-and-play: grab your existing data, add NLWeb, choose a model, done. “We want NLWeb to make it easy for any web publisher to create an intelligent, natural language experience for their site – just like HTML made it easy for almost anyone to create a website,” Microsoft says news.microsoft.com. Some key advantages that NLWeb promises include:

  • Ease of Implementation: Only a few lines of markup or code are needed to enable NLWeb on a site, leveraging data the site already produces (like RSS feeds or JSON APIs) theverge.com theverge.com. This is far easier than building an NLP system from scratch.
  • Model Flexibility: Website owners can choose any AI model that suits their needs – from big cloud models like OpenAI’s, to smaller open-source models they can run cheaply news.microsoft.com greyhoundresearch.com. The protocol doesn’t lock users into a single AI provider.
  • Cost Efficiency: Without needing to crawl or index the entire web, and by using relatively small models on focused data, running NLWeb can be very affordable. Early tests showed it works even with lightweight models, cutting the cost compared to traditional web search infrastructure theverge.com theverge.com.
  • Better User Experience: Users can ask precise, nuanced questions and get tailored answers from the site itself, rather than doing keyword searches or wading through generic chatbot answers. This conversational interface can remember context (like the user’s dietary preferences or previous queries) within a session, making interactions more personalized theverge.com theverge.com.
  • Maintaining Engagement: Crucially, NLWeb keeps the interaction on the site. Instead of a general AI search engine funneling answers (and traffic) away, the website’s own chatbot engages the user and can direct them to pages on that site – preserving page views, ad impressions, or potential sales that would otherwise be lost to an off-site answer theverge.com theverge.com. It allows sites to benefit from AI-driven Q&A, rather than being quietly mined by it.

From the end-user perspective, this could make the web feel more integrated and intelligent. Rather than treating each website as an island you have to manually navigate, you could converse your way through the web’s information. Imagine asking a cooking site “I have mushrooms and eggs, what can I make?” or asking a government site “How do I renew my passport?” and getting direct, context-aware guidance. For accessibility, it could be a boon as well – users who have difficulty with traditional interfaces might find it easier to just ask for what they need. And for the growing world of AI personal assistants, NLWeb-enabled sites will be first-class citizens that these assistants can query and even transact with.

Challenges and the Road Ahead

As transformative as NLWeb sounds, its success is not guaranteed. One big challenge is adoption: A protocol only works if a critical mass of websites implement it, and if major industry players agree to support it. Guha acknowledges that companies like Google or Meta “have incentives to play along, but [also] they might not.” Each may prefer their own approach to conversational AI or fear ceding an advantage theverge.com. “You can’t brute-force a protocol; all you can do is hope everyone sees a reason to get on board,” Guha notes realistically theverge.com. There is also historical skepticism – the web has tended to centralize around dominant platforms despite open alternatives. We saw portals like Yahoo give way to a single search giant (Google), and now a few AI bots could become gatekeepers. “There’s not actually much evidence to suggest that this kind of massive decentralization can work at all,” writes The Verge, pointing out that the best tech usually wins even if it recentralizes power theverge.com theverge.com. NLWeb will need to prove that an open, distributed network of site-specific AI agents can compete with (or complement) the convenience of one-stop-shop AI assistants.

Another consideration is governance and standards. Who will oversee NLWeb’s evolution? Microsoft launched it, but they intend it as an open project. NLWeb might need a neutral governing body or consortium to truly gain industry trust (similar to how W3C stewards web standards). Encouragingly, Microsoft has joined the steering committee for MCP (the context protocol) alongside other partners greyhoundresearch.com greyhoundresearch.com. If NLWeb follows a similar open governance model, it could alleviate fears that this is just “Microsoft’s protocol” – it needs to be seen as everyone’s protocol.

Privacy and security will also be critical. A conversational agent on a website could potentially access user-specific data or perform actions (like making a purchase) if not properly sandboxed. The NLWeb responses and the MCP context-sharing must be handled carefully to prevent leaks or abuse. Microsoft and others are likely to build in permissioning models – for instance, an agent should not get private info from a site unless authorized. Early versions seem focused on read-only search capabilities, which limits risk, but as NLWeb grows to enable transactions, robust security standards will be required.

Despite these hurdles, many in the industry see huge promise. The year 2025 has seen an explosion of interest in what Microsoft calls the “open agentic web,” where AI agents are first-class citizens alongside browsers and apps simform.com. NLWeb is a cornerstone of that vision, providing a bridge between today’s web content and tomorrow’s AI-driven interactions. “NLWeb offers a lifeline — a thin, declarative layer that brings AI interaction within reach, without requiring a ground-up rebuild [of websites],” as one analysis put it greyhoundresearch.com greyhoundresearch.com. It’s rare to hear a new web protocol described as “profound,” but some experts are genuinely excited: “At Greyhound Research, we believe NLWeb is one of the most quietly profound announcements to come out of Build 2025… it gives the web its missing abstraction layer for AI,” the firm stated greyhoundresearch.com greyhoundresearch.com.

In the coming months and years, watch for more toolkits, documentation, and success stories around NLWeb. Microsoft and its partners are likely to showcase how adding a conversational interface boosts user engagement or enables new services. If more browsers, content management systems, and AI platforms start baking in NLWeb support, it could quickly gain momentum. And if competitors like Google decide to embrace NLWeb (or a similar standard) for their own agent ecosystems, it would validate the concept of an open conversational web.

Conclusion: Towards an Open Conversational Internet

NLWeb represents a bold attempt to reshape the fabric of the web for the age of AI. It proposes that every website can become an intelligent assistant – not by outsourcing to a monolithic AI, but by adopting a common protocol that any AI can understand. In doing so, it hopes to keep the web open and decentralized, ensuring that no single company’s chatbot becomes the gatekeeper to all knowledge. Whether NLWeb becomes the “HTML of the AI era” or not will depend on how broadly the industry and developers rally behind it. But the genie is out of the bottle: the way we interact with software is shifting toward natural language, and the web will need to adapt. With NLWeb, Microsoft has put forward a compelling blueprint for that adaptation – one that just might preserve the web’s original spirit of openness even as it enters a conversational future. As Guha simply puts it, “the idea is… it’s an open protocol” theverge.com – and if it succeeds, chatting with websites could one day feel as ordinary as clicking links is today.

Sources: