How AI Agents Actually Browse the Web (And Why Most Sites Are Hostile to Them)

Right now, on the internet, an AI agent is opening a webpage. It's parsing the HTML, extracting the text content, summarizing what it found, and reporting back to a human user who asked it a question. This is happening millions of times per day. It's already a measurable percentage of all web traffic. And almost nobody who builds websites is thinking about it.

The web was designed for humans. Browsers render pages visually, humans read them, humans click links, humans enter text in forms. Every assumption baked into how websites are built (from page layout to authentication to anti-bot measures) assumes that the entity on the other end is a human with eyes, hands, and a browser. AI agents don't fit any of these assumptions, and the resulting friction is creating one of the most interesting problems in web infrastructure right now.

What an AI agent actually does when it browses

When you ask Claude or ChatGPT a question that requires current information ("what's the weather in Tokyo right now" or "summarize this news article") the model doesn't have the answer in its training data. The training data is months or years old. So the AI needs to go get the information from the actual internet. This is where AI agents come in.

The agent receives a task. It decides what information it needs. It searches the web (using a search engine or its built-in browsing tool). It picks a result. It fetches the page. It parses the HTML to extract the readable text. It identifies the relevant information. It summarizes that information. It returns the summary to the human. The whole process takes a few seconds and involves multiple HTTP requests, multiple decisions, and multiple transformations of data.

The agent isn't a simple scraper. It's making real decisions at every step, similar to how a human researcher works. If the first result doesn't have the information, it tries another. If a page is paywalled, it looks for alternatives. If the data format is confusing, it tries to make sense of it. The intelligence is in the decision-making, not just the fetching.

Why most websites are accidentally hostile to AI agents

Here's where it gets interesting. Most websites are not designed with AI agents in mind, which means they accidentally create obstacles that make it hard for agents to extract information. Some of these obstacles are intentional anti-bot measures. Others are just consequences of modern web development practices.

JavaScript-rendered content. Modern web frameworks like React, Vue, and Svelte build the page in the browser using JavaScript. When you fetch the raw HTML, you get a mostly empty document with a script tag that says "build the real page here." Humans don't notice this because their browser executes the JavaScript and renders the actual content. AI agents that don't run a full browser see an empty page. They have no idea what's actually on it.

Anti-bot infrastructure. Cloudflare's bot protection, Google's reCAPTCHA, and similar services try to distinguish between humans and automated traffic. They use a combination of fingerprinting, behavioral analysis, and challenge-response puzzles. These systems are getting better at detecting AI agents and either blocking them entirely or making them solve CAPTCHAs they can't solve. From the agent's perspective, the page just doesn't load.

Paywalls and registration walls. Most major news sites now require either a subscription or a free registration to read articles. AI agents can't sign up for accounts. They can't enter payment information. They can't click through "I am not a robot" forms. When they hit a paywall, they have no way around it. They give up and try another source.

Cookie banners and consent forms. Modern websites bury their content behind a wall of GDPR consent popups, cookie acceptance forms, and notification permission requests. Humans dismiss these reflexively. Agents struggle with them because they require interaction in specific ways that vary across sites.

Unstructured HTML. Even when an agent successfully fetches a page, the content might be buried in deeply nested div tags with random class names, intermixed with ads and navigation, and impossible to extract reliably. The page renders correctly for humans but is parsed by agents as a soup of meaningless text.

What AI-friendly websites look like

A small but growing number of websites are starting to design for AI agents alongside humans. The patterns are becoming clearer.

Structured data via JSON APIs. Instead of forcing agents to scrape HTML, these sites offer clean JSON endpoints that return the same data in a format that's trivially easy to parse. The agent makes one HTTP request, gets a structured response, and immediately has what it needs. No HTML parsing required. TerminalFeed does this: every panel of data on the dashboard is also available as a JSON endpoint at /api/[name]. Agents that discover the API never need to touch the HTML.

The llms.txt file. This is an emerging standard, similar to robots.txt, that tells AI crawlers what data is available and how to access it. The file lives at the root of the website and contains plain English descriptions of available endpoints, data sources, and usage guidelines. Agents that find an llms.txt file get a clear map of what the site offers without having to figure it out themselves. TerminalFeed has one at terminalfeed.io/llms.txt. So do a growing number of other data-focused sites.

Server-side rendering of critical content. Even sites that use React or other JavaScript frameworks can render their main content server-side, so the initial HTML response contains the actual text. This makes the content visible to agents that don't run JavaScript. Next.js, Nuxt, SvelteKit, and Astro all support this. The performance benefits for human users are also significant.

Graceful handling of bot traffic. Instead of blocking all non-human requests, AI-friendly sites identify legitimate AI agents (Claude, GPT, Perplexity, etc.) and allow them through with reasonable rate limits. The bot protection still blocks malicious scrapers, but legitimate AI traffic is treated as a first-class citizen.

Public APIs without authentication for public data. Many sites have data that's freely visible to humans but require API keys, account registration, or payment to access programmatically. This is backwards. If the data is public, the API should be public. TerminalFeed's /api/briefing endpoint requires no authentication because the data it returns is the same data anyone can see by visiting the dashboard.

The flywheel that's about to start

Here's the part that makes this interesting from a business perspective. AI agents are going to keep growing as a percentage of web traffic. Humans increasingly ask AI to do research for them rather than doing it themselves. Every major AI provider is building agent capabilities into their products. The trend is clearly heading toward more agent traffic, not less.

The websites that make themselves AI-friendly now will benefit from a flywheel effect. When an AI cites a website as a source, the human reading that response often clicks through to verify or learn more. This creates traffic that's directly attributable to AI usage. The AI-friendly sites get cited more often, which sends them more human traffic, which signals to other AIs that those sites are reliable sources, which leads to more citations, which leads to more traffic. It's the SEO flywheel of 2010 all over again, but for AI-mediated discovery instead of search engines.

The websites that don't adapt will lose out. They'll still get human traffic, but they'll be invisible to the AI layer that's increasingly mediating how humans find information. As AI usage grows, being invisible to AI becomes more and more costly.

What you can do this week

If you run a website and you want it to be friendly to AI agents, there are a few things you can do this week.

Add an llms.txt file to your root directory. Document what your site offers and how to access it. Even a basic version helps agents understand what they're looking at.

Provide a JSON API for your most valuable data. Even if your main interface is HTML, give agents a structured alternative. This could be as simple as a single endpoint that returns your most important content as JSON.

Make sure your most important pages are server-side rendered. If you're using React or another JavaScript framework, enable SSR for pages that contain primary content. The agents will be able to read it without running a full browser.

Allow legitimate AI traffic through your bot protection. Identify the user agents that legitimate AI providers use (they document this) and add them to your allowlist with appropriate rate limits.

Think about your content from the perspective of an entity that can't see colors, can't click buttons, and can't fill out forms. What would it need to extract value from your site? Build for that entity alongside building for humans.

The web is in the middle of the biggest shift in how it's used since the introduction of mobile browsers. The shift is happening quietly, but it's happening fast. The sites that figure out how to serve both humans and AI agents are going to have a structural advantage over the sites that only serve one or the other. In a year or two, AI-friendliness will be a baseline expectation, the same way mobile-friendliness became a baseline expectation in 2015.

For now, it's still an opportunity. Most sites haven't thought about this yet. The few that have are quietly capturing all the AI-mediated traffic. If you're building anything web-based right now, this is one of the most important things you should be thinking about.

TerminalFeed is built for both humans and AI agents. Open API, llms.txt, and 20+ JSON endpoints.

Explore the API

What an AI agent actually does when it browses

Why most websites are accidentally hostile to AI agents

What AI-friendly websites look like

The flywheel that's about to start

What you can do this week

RELATED ARTICLES

RELATED TOOLS