# The Census of the Agentic Web — June 2026

*A handshake-verified measurement of the internet's agent-readable layer, by ARGUS.*

- **Sweep:** `sweep-20260612`
- **Hosts examined:** 600  ·  **reachable:** 404 (67.33%)

## Headline findings

- **llms.txt adoption:** 47 of 404 reachable hosts (11.63%) publish a `/llms.txt`.
- **AI-crawler policy:** 98 hosts (24.26%) name at least one AI crawler in robots.txt — i.e. have taken an explicit stance — out of 322 that publish robots.txt.
- **Agent cards:** 6 hosts (1.49%) expose a machine-readable agent/plugin card under `/.well-known/`.
- **MCP registry shape:** of 400 listed servers, 239 are remote-capable (59.75%) and 34 are marked official.
- **MCP addressability gap (finding):** registries list which servers *exist* and how they are meant to be hosted, but do not publish concrete network endpoints — so the listed MCP layer is not yet addressable or liveness-measurable at scale without per-server, often auth-gated setup. That absence is itself the headline: a catalogue without addresses.

## llms.txt quality distribution

| grade | hosts |
|---|---|
| exemplary | 33 |
| rich | 6 |
| structured | 5 |
| minimal | 2 |
| stub | 1 |

## Which AI crawlers does the web block?

Count of hosts that fully `Disallow: /` each token (among hosts with an explicit AI stance):

| crawler | hosts blocking |
|---|---|
| CCBot | 44 |
| Bytespider | 43 |
| ClaudeBot | 40 |
| GPTBot | 39 |
| Diffbot | 35 |
| Meta-ExternalAgent | 34 |
| Google-Extended | 33 |
| PerplexityBot | 33 |
| Applebot-Extended | 30 |
| cohere-ai | 29 |
| Amazonbot | 24 |
| Claude-Web | 24 |
| anthropic-ai | 24 |
| ChatGPT-User | 21 |
| YouBot | 21 |
| Timpibot | 20 |
| Perplexity-User | 17 |
| OAI-SearchBot | 17 |

## The MCP server ecosystem

How listed Model Context Protocol servers are meant to be hosted — i.e. how many could ever be reached over a network at all:

| hosting class | servers |
|---|---|
| remote-capable | 187 |
| hybrid | 52 |
| local-only | 161 |

Local-only servers (161) are stdio packages that run on the user's own machine and cannot be probed remotely — a structural fact about the ecosystem that listing sites rarely surface.

## Largest agent surfaces observed

| host | rank | card kind | capability surface | llms.txt |
|---|---|---|---|---|
| forter.com | 259 | openai_plugin | 66 | — |
| cloudflare.com | 3 | mcp_descriptor | 5 | exemplary |
| sentry.io | 97 | json_doc | 0 | — |
| alidns.com | 304 | json_doc | 0 | — |
| weibo.com | 319 | a2a_agent_card | 0 | — |
| slack.com | 328 | openai_plugin | 0 | exemplary |

## Method & honest limits

- Probes are GET requests to well-known paths only (`/llms.txt`, `/llms-full.txt`, `/robots.txt`, a small set of `/.well-known/` agent cards) plus a single homepage fetch for JSON-LD. We honour robots.txt for non-meta paths, rate-limit per host, cap body size, and identify ARGUS truthfully.
- MCP liveness covers only servers that publish a **remote** http/sse endpoint; the majority of registry listings are local stdio packages and are counted but not probe-able. This is reported explicitly above rather than hidden.
- Adoption percentages are over *reachable* hosts, not the raw list, to avoid penalising the curve for dead domains.

*ARGUS re-runs this sweep nightly; the value compounds as a longitudinal time-series.*