Network Working Group A. Jurkovikj
Internet-Draft 4 November 2025
Intended status: Standards Track
Expires: 8 May 2026
The Collaboration Tunnel Protocol
draft-jurkovikj-collab-tunnel-00
Abstract
This document specifies the Collaboration Tunnel Protocol, a method
for efficient, verifiable content delivery between web publishers and
automated agents. The protocol typically achieves up to 90%
bandwidth reduction (83% median measured) through bidirectional URL
discovery, template-invariant content fingerprinting, sitemap-first
verification, and strict conditional request discipline.
Test Vector 2: Entity and Unicode - Input String (literal): Test
& Unicode: café- - Normalized String: test & unicode: café- -
SHA-256 (hex):
f58639b586fac9cb70d4513c83a6b2954178a80f12f5c1069aad09d124ef7b24 -
contentHash: sha256-
f58639b586fac9cb70d4513c83a6b2954178a80f12f5c1069aad09d124ef7b24 -
ETag: "sha256-
f58639b586fac9cb70d4513c83a6b2954178a80f12f5c1069aad09d124ef7b24"
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 8 May 2026.
Copyright Notice
Copyright (c) 2025 IETF Trust and the persons identified as the
document authors. All rights reserved.
Jurkovikj Expires 8 May 2026 [Page 1]
Internet-Draft Collab-Tunnel November 2025
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 4
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
2. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Architecture . . . . . . . . . . . . . . . . . . . . . . 5
3. Protocol Requirements . . . . . . . . . . . . . . . . . . . . 6
3.1. MUST Requirements . . . . . . . . . . . . . . . . . . . . 6
3.2. SHOULD Recommendations . . . . . . . . . . . . . . . . . 9
3.3. MAY Extensions . . . . . . . . . . . . . . . . . . . . . 9
4. Bidirectional Discovery . . . . . . . . . . . . . . . . . . . 10
4.1. C-URL to M-URL Mapping . . . . . . . . . . . . . . . . . 10
4.2. M-URL to C-URL Canonicalization . . . . . . . . . . . . . 10
4.3. Deterministic Mapping . . . . . . . . . . . . . . . . . . 11
4.4. Content Pages Only . . . . . . . . . . . . . . . . . . . 11
5. Template-Invariant Fingerprinting . . . . . . . . . . . . . . 12
5.1. Publisher Source of Content (Informative) . . . . . . . . 13
5.2. Normalization Algorithm . . . . . . . . . . . . . . . . . 14
5.3. Deterministic JSON Serialization (Normative) . . . . . . 15
5.4. Strong ETag and Parity (Normative) . . . . . . . . . . . 16
5.4.1. Method A: Canonical JSON Strong-Byte (Recommended) . 16
5.4.2. Method B: Content-Locked Strong-Content (Allowed with
Restrictions) . . . . . . . . . . . . . . . . . . . . 18
5.5. Parity Rule (Normative) . . . . . . . . . . . . . . . . . 19
5.6. Template-Invariance . . . . . . . . . . . . . . . . . . . 19
6. Conditional Request Discipline . . . . . . . . . . . . . . . 20
6.1. If-None-Match Precedence . . . . . . . . . . . . . . . . 20
6.2. 304 Not Modified Response . . . . . . . . . . . . . . . . 20
6.3. Replay and Stale Intermediaries . . . . . . . . . . . . . 21
6.4. Cache-Control Directives . . . . . . . . . . . . . . . . 21
6.5. Vary Header . . . . . . . . . . . . . . . . . . . . . . . 21
6.6. HEAD Request Support . . . . . . . . . . . . . . . . . . 22
7. Sitemap-First Verification . . . . . . . . . . . . . . . . . 22
7.1. JSON Sitemap Format . . . . . . . . . . . . . . . . . . . 22
7.2. Sitemap Scalability . . . . . . . . . . . . . . . . . . . 24
7.3. Error Handling (Informative) . . . . . . . . . . . . . . 26
7.4. Zero-Fetch Skip Logic . . . . . . . . . . . . . . . . . . 26
8. Publisher Policy Descriptor . . . . . . . . . . . . . . . . . 27
Jurkovikj Expires 8 May 2026 [Page 2]
Internet-Draft Collab-Tunnel November 2025
8.1. Policy Endpoint . . . . . . . . . . . . . . . . . . . . . 27
8.2. JSON Schema . . . . . . . . . . . . . . . . . . . . . . . 27
8.3. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 29
8.4. Alignment with IETF AIPREF . . . . . . . . . . . . . . . 30
9. M-URL Response Format . . . . . . . . . . . . . . . . . . . . 30
9.1. Content-Type . . . . . . . . . . . . . . . . . . . . . . 30
9.2. JSON Payload Schema . . . . . . . . . . . . . . . . . . . 30
9.3. Complete Response Example . . . . . . . . . . . . . . . . 31
10. Operational Considerations (Informative) . . . . . . . . . . 32
11. Security Considerations . . . . . . . . . . . . . . . . . . . 33
11.1. HTTPS and TLS . . . . . . . . . . . . . . . . . . . . . 33
11.2. Rate Limiting . . . . . . . . . . . . . . . . . . . . . 33
11.3. Content Integrity . . . . . . . . . . . . . . . . . . . 33
11.4. Privacy . . . . . . . . . . . . . . . . . . . . . . . . 33
11.5. Denial of Service . . . . . . . . . . . . . . . . . . . 34
11.5.1. Sitemap Abuse . . . . . . . . . . . . . . . . . . . 34
11.5.2. HEAD vs GET Bandwidth . . . . . . . . . . . . . . . 34
11.6. Injection Surface Reduction . . . . . . . . . . . . . . 34
11.7. Cache Poisoning . . . . . . . . . . . . . . . . . . . . 34
11.8. Fingerprint Collision and Normalization Variance . . . . 34
11.9. Content Provenance and Origin Authentication
(Optional) . . . . . . . . . . . . . . . . . . . . . . 34
11.10. Privacy and PII . . . . . . . . . . . . . . . . . . . . 35
11.11. Access Control . . . . . . . . . . . . . . . . . . . . . 35
12. Energy Efficiency Considerations . . . . . . . . . . . . . . 36
12.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 36
12.2. Network Energy Consumption . . . . . . . . . . . . . . . 36
12.3. AI Inference Impact (Informative Summary) . . . . . . . 36
12.4. Sitemap-First Zero-Fetch Optimization . . . . . . . . . 37
12.5. Comparison to Existing Approaches . . . . . . . . . . . 37
12.6. Cumulative Environmental Impact . . . . . . . . . . . . 37
12.7. Relationship to IETF GREEN Working Group . . . . . . . . 38
12.8. Recommendations for Implementers . . . . . . . . . . . . 38
12.9. Future Work . . . . . . . . . . . . . . . . . . . . . . 39
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39
14. Comparison to Prior Art . . . . . . . . . . . . . . . . . . . 39
14.1. ResourceSync . . . . . . . . . . . . . . . . . . . . . . 40
14.2. AMP . . . . . . . . . . . . . . . . . . . . . . . . . . 40
14.3. XML Sitemaps . . . . . . . . . . . . . . . . . . . . . . 40
15. Implementation Status . . . . . . . . . . . . . . . . . . . . 41
16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 42
17.1. Normative References . . . . . . . . . . . . . . . . . . 42
17.2. Informative References . . . . . . . . . . . . . . . . . 42
Appendix A. Example Implementation (WordPress) . . . . . . . . . 44
Appendix B. Example Sitemap . . . . . . . . . . . . . . . . . . 45
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 46
Jurkovikj Expires 8 May 2026 [Page 3]
Internet-Draft Collab-Tunnel November 2025
1. Introduction
Automated agents (AI crawlers, search engines, content aggregators)
increasingly consume web content at scale. Traditional HTML delivery
designed for human browsers imposes unnecessary overhead:
presentational boilerplate (navigation, footers, advertisements),
large CSS/JavaScript bundles, and redundant fetches of unchanged
content.
Existing approaches address portions of this problem:
* XML Sitemaps [XMLSitemaps] provide discovery but lack content
fingerprints
* AMP [AMP] reduces HTML overhead but lacks synchronized hashing
* ResourceSync [ResourceSync] provides digest-based synchronization
but lacks endpoint-level validator discipline
The Collaboration Tunnel Protocol (TCT, also referred to as "collab-
tunnel") integrates these concepts into a cohesive system optimized
for machine consumption while preserving human-readable canonical
URLs for SEO and web compatibility.
1.1. Problem Statement
Current AI crawler behavior (2025) demonstrates:
1. *Bandwidth Waste*: Fetching full HTML documents when only core
content is needed
2. *Token Overhead*: Processing boilerplate (navigation, footers)
consumes 86% of tokens
3. *Redundant Fetches*: No efficient skip mechanism when content
unchanged
4. *Lack of Verification*: No cryptographic proof of content
delivery
Measured impact (from live deployments):
* HTML-only retrieval: 103 KB average (13,900 tokens)
* TCT JSON delivery: 17.7 KB average (1,960 tokens)
* *Savings: 83% bandwidth, 86% tokens*
Jurkovikj Expires 8 May 2026 [Page 4]
Internet-Draft Collab-Tunnel November 2025
1.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
*C-URL (Canonical URL)*: The human-readable URL of a web resource,
typically serving HTML.
*M-URL (Machine URL)*: A deterministically mapped endpoint serving
machine-readable structured content for the same resource.
*Template-Invariant Fingerprint*: A cryptographic hash computed from
normalized core content, stable across presentation changes.
*Core Content*: The primary informational content of a resource,
excluding presentational boilerplate.
2. Protocol Overview
The Collaboration Tunnel Protocol consists of four coordinated
mechanisms:
1. *Bidirectional Discovery*: Explicit C-URL <-> M-URL handshake
preventing SEO conflicts
2. *Template-Invariant Fingerprinting*: Normalized content hashing
for stable cache validation
3. *Conditional Request Discipline*: Strict If-None-Match precedence
and 304 responses
4. *Sitemap-First Verification*: Zero-fetch skip logic when content
unchanged
2.1. Architecture
Jurkovikj Expires 8 May 2026 [Page 5]
Internet-Draft Collab-Tunnel November 2025
+------------------------+ +----------------------+
| Publisher (Origin) | | Automated Agent |
+------------------------+ +----------------------+
| | | |
| C-URL (HTML) |<---------| 1. Fetch Sitemap |
| +- | | 3. If Changed: |
| | | GET /llm/ |
| M-URL (/llm/) |<---------| If-None-Match |
| +- Link: canonical | | |
| +- ETag: sha256-... | | |
| +- Content-Type: JSON |--------->| 4. 304 or 200+JSON |
| | | |
| /llm-sitemap.json | | 5. Cache ETag |
| +- {cUrl, mUrl, |<---------| |
| | contentHash} | | |
+------------------------+ +----------------------+
3. Protocol Requirements
This section defines the normative requirements for TCT compliance.
3.1. MUST Requirements
Implementations MUST:
1. *Bidirectional Discovery*
* C-URL HTML MUST include pointing to M-URL
* M-URL response MUST include Link: ; rel="canonical"
HTTP header
2. *Validators*
* M-URL response MUST include ETag header
* M-URL response SHOULD include Last-Modified header when
available
* Sitemap MUST include contentHash field for each URL
3. *Conditional Requests*
Jurkovikj Expires 8 May 2026 [Page 6]
Internet-Draft Collab-Tunnel November 2025
* M-URL responses MUST use strong ETags for template-invariant
fingerprints: "sha256-<64 lowercase ASCII hex chars>"
* Server MUST honor If-None-Match header
* Server MUST return 304 Not Modified when ETag matches
* Server MUST give If-None-Match precedence over If-Modified-
Since (see [RFC9110], Section 13.1.2)
* If-Range requires a strong validator. If the If-Range
validator is weak or does not match, the server MUST ignore
the Range header and send 200 OK with the full representation
(per [RFC9110], Section 13.1.5)
* Servers MUST NOT assemble ranges across different encodings;
ranges apply only to the selected representation variant.
4. *304 Response*
* Response MUST NOT include message body
* Servers MUST include ETag if the corresponding 200 would;
otherwise SHOULD include ETag
* Servers SHOULD include Cache-Control (consistent with
[RFC9110], Section 15.4.5)
5. *HEAD Support*
* Servers SHOULD support HEAD requests for all M-URLs and
sitemaps
* HEAD responses MUST return the same validators and cache
headers as GET and MUST NOT include a message body
* Body-dependent headers (e.g., Content-Length) MAY reflect the
size of the corresponding GET response
* This behavior applies across HTTP/1.1, HTTP/2, and HTTP/3
* Implementations MUST avoid hop-by-hop headers on M-URLs and
sitemaps and SHOULD NOT use transfer-codings or trailers on
these responses
6. *Sitemap Parity*
Jurkovikj Expires 8 May 2026 [Page 7]
Internet-Draft Collab-Tunnel November 2025
* Sitemap contentHash value MUST equal M-URL ETag value
(excluding quotes)
* Example: M-URL ETag "sha256-abc" -> Sitemap contentHash
sha256-abc
* See "Template-Invariant Fingerprinting" and "ETag Generation"
for computation and format details
7. *Canonical Verification (Agents)*
* Agents MUST verify that the M-URL response includes Link:
; rel="canonical" and that the canonical URL matches
the expected C-URL before processing
* If the canonical link is missing or mismatched, agents SHOULD
treat the endpoint as non-compliant and skip ingestion
* If HTML discovery and the M-URL's
self-declared canonical conflict, agents SHOULD prefer the
M-URL's self-declared canonical and flag for operator review
8. *Protected Sitemaps*
* Sitemaps that list protected M-URLs MUST NOT be served
without access controls equivalent to those applied to the
corresponding M-URLs
9. *Client Parity Behavior (Agents)*
* Agents MUST NOT attempt to recompute the fingerprint from
C-URL HTML to verify parity
* Agents MUST verify parity by comparing the sitemap
contentHash value to the M-URL ETag value (excluding quotes)
* The comparison is a simple string equality check: contentHash
=== cleanETag(ETag)
* Agents SHOULD NOT implement the normalization algorithm for
compliance purposes; parity verification is sufficient
Jurkovikj Expires 8 May 2026 [Page 8]
Internet-Draft Collab-Tunnel November 2025
10. *Content Pages Only* - Publishers SHOULD NOT provide M-URLs for
archive, category, tag, search, or date-based listing pages;
where requested, servers SHOULD return 404 - Deployments with
strong rationale MAY include such endpoints but MUST maintain
parity semantics - Sitemaps SHOULD include only content pages
(posts, articles, pages) - A homepage MAY be included only if it
represents stable content
3.2. SHOULD Recommendations
Implementations SHOULD:
1. *Cache-Control*
* Use: max-age=0, must-revalidate, stale-while-revalidate=60,
stale-if-error=86400
* Rationale: Enables revalidation with graceful stale serving
2. *Vary Header*
* Include: Vary: Accept-Encoding
* Rationale: Content varies by compression (gzip, br)
3. *Strong ETags*
* Use strong ETags ("sha256-...") for TCT semantic fingerprints
(per MUST requirement #3)
* Rationale: Normalized content produces byte-identical JSON
responses; strong validators ensure reliable cache
compatibility
* Note: Strong ETags MAY be used for representation-specific
caching outside TCT parity (e.g., CDN byte-level caching)
4. *Content Pages Only*
* Return 404 for M-URL requests to archive pages
* Include only content pages in sitemap
3.3. MAY Extensions
Implementations MAY:
1. *Policy Descriptor*
Jurkovikj Expires 8 May 2026 [Page 9]
Internet-Draft Collab-Tunnel November 2025
* Servers MAY advertise a machine-readable policy via Link:
; rel="describedby"; type="application/json"
* The policy document format is informative and out of scope for
the core protocol (see Publisher Policy Descriptor section)
2. *Additional Integrity*
* Use Content-Digest header ([RFC9530])
* Use HTTP Message Signatures ([RFC9421])
3. *Additional Formats*
* Provide PDF JSON alternates
* Provide receipt/proof systems
4. Bidirectional Discovery
Note: Path names in examples are non-normative. This specification
does not require any specific URL paths. Example path "/llm/" is
illustrative. Servers MAY choose alternative slugs or publish
mappings. Agents MUST NOT assume a fixed path and SHOULD discover
M-URLs via HTML rel="alternate", HTTP Link headers, or the JSON
sitemap.
4.1. C-URL to M-URL Mapping
The C-URL MUST include an HTML element in the document :
*Attributes:*
* rel="alternate": Indicates alternate representation ([RFC8288])
* type="application/json": Machine-readable format
* href: Absolute or relative URL to M-URL
4.2. M-URL to C-URL Canonicalization
The M-URL response MUST include an HTTP Link header with
rel="canonical":
Jurkovikj Expires 8 May 2026 [Page 10]
Internet-Draft Collab-Tunnel November 2025
Link: ; rel="canonical"
This establishes bidirectional verification and prevents SEO
duplication. The canonical link relation is registered by [RFC6596].
4.3. Deterministic Mapping
M-URLs SHOULD follow a deterministic pattern from C-URLs.
*Non-normative examples*: Append a slug to the C-URL path, such as
/llm/.
Example: - C-URL: https://example.com/post/ - M-URL:
https://example.com/post/llm/
*Guidance*: - These path patterns are examples, not protocol
requirements. If a preferred slug collides with existing site
routes, publishers MAY choose an alternate (e.g., /api/llm/,
/content/llm/) or publish a mapping in a site-level manifest. -
Agents MUST NOT assume a fixed path. Agents SHOULD discover M-URLs
via HTML , HTTP Link
headers, or the JSON sitemap.
*Migration Note*: Publishers MAY use HTTP 308 Permanent Redirect to
migrate from legacy paths to preferred endpoints and SHOULD list only
the primary M-URL in the sitemap.
*Sitemap Discovery*:
Publishers SHOULD advertise the sitemap via one or both of:
1. *Link header* (RECOMMENDED) on the homepage or C-URL responses:
Link: ; rel="index"; type="application/json"
1. *Well-known URI* (OPTIONAL): Agents MAY check /.well-known/llm-
sitemap.json. The .well-known URI convention follows [RFC8615].
Note: The well-known URI is not registered in IANA for -00; future
versions may formalize this. Agents SHOULD try the Link header
first, then fall back to well-known URI if needed.
4.4. Content Pages Only
Publishers SHOULD NOT provide M-URLs for archive, category, tag,
search, or date-based listing pages; where requested, servers SHOULD
return 404. Deployments with strong rationale MAY include such
endpoints but MUST maintain parity semantics.
Jurkovikj Expires 8 May 2026 [Page 11]
Internet-Draft Collab-Tunnel November 2025
*Rationale:*
* Archive pages contain navigation and lists, not primary content
* Template-invariant fingerprinting is designed for stable content,
not dynamic lists
* Archive pages change frequently as new content is published
*Implementation:*
* Publishers SHOULD return HTTP 404 for M-URL requests to archive
pages
* Sitemaps SHOULD include only content page URLs, not archives
* Homepage MAY be included if it represents stable content
*Dynamic Homepage Guidance:*
If the homepage displays a dynamic content roll-up (e.g., recent
posts, latest articles), publishers SHOULD prefer one of:
* Provide a synthesized stable overview representing the site (name,
description, purpose)
* Include a stable "About" page as the first sitemap item instead of
the homepage
Rationale: Dynamic homepages change frequently and may not provide
valuable semantic content for automated agents. - CMS guidance (non-
normative): For platforms like WordPress, exclude archive-like routes
(e.g., category, tag, search, date, author) from M-URL handling and
return 404 rather than 200 with empty payload. Ensure only singular
content types (posts, pages, articles) emit M-URLs and sitemap
entries.
*Content page examples:* - Blog posts: /blog/understanding-tct/ -
Articles: /news/2025/protocol-launch/ - Static pages: /about/,
/contact/
*Archive page examples (should NOT have M-URLs):* - Category
archives: /category/technology/ - Tag archives: /tag/web-protocols/ -
Date archives: /2025/10/ - Search results: /search/?q=protocol -
Author archives: /author/john/
5. Template-Invariant Fingerprinting
Jurkovikj Expires 8 May 2026 [Page 12]
Internet-Draft Collab-Tunnel November 2025
5.1. Publisher Source of Content (Informative)
The M-URL JSON payload SHOULD be produced from the platform's core
content body (e.g., the WordPress post content field, a CMS article
body), independent of the theme or template layer. This ensures the
fingerprint is template-invariant.
Publishers MAY include additional semantic text in the content field
to make the fingerprint sensitive to changes in those elements. For
example:
* Title: Including the resource title ensures title changes produce
new fingerprints
* Media descriptions: Including image alt text or
content ensures accessibility metadata is tracked
* Deterministic order: If multiple elements are included, use a
consistent order (e.g., title, blank line, main content)
Example reconstructed content:
Understanding the Collaboration Tunnel Protocol
The Collaboration Tunnel Protocol enables efficient content delivery.
Diagram showing protocol flow. The protocol achieves 80-90% bandwidth
reduction through conditional requests.
This approach balances template-invariance (content independent of
presentation) with semantic completeness (title and media
descriptions included). Publishers using this approach should ensure
all included text goes through the same normalization pipeline
defined in the next section.
*Example of content Field Construction:*
A publisher implementation might construct the content field by
combining several data sources in a deterministic order.
Input Data: - Title: TCT Protocol Guide - Body Paragraph 1: The
protocol is simple. - Image Alt: A flow diagram - Body Paragraph 2:
It saves bandwidth.
Resulting content string in the JSON Payload:
Jurkovikj Expires 8 May 2026 [Page 13]
Internet-Draft Collab-Tunnel November 2025
TCT Protocol Guide
The protocol is simple.
[Image: A flow diagram]
It saves bandwidth.
This creates a readable representation that is also a stable and
reliable input for the fingerprinting algorithm.
5.2. Normalization Algorithm
To generate the template-invariant fingerprint, the server MUST
operate on the JSON payload's content field (not on the C-URL HTML),
applying the following steps in order:
1. Decode HTML Entities: Decode any HTML entities present in the
content string (e.g., & -> &, — -> -)
2. Apply Unicode Normalization: Apply Unicode Normalization Form KC
(NFKC) as defined in Unicode Standard Annex #15
3. Apply Unicode Case Folding: Convert to lowercase using the
standard, locale-independent Unicode case-folding algorithm as
defined in the Unicode Standard
4. Remove Control Characters: Remove all characters in the Unicode
general category "Control" (Cc), which includes characters U+0000
through U+001F and U+007F through U+009F. Preserve only U+0009
(TAB), U+000A (LINE FEED), and U+000D (CARRIAGE RETURN) for
subsequent whitespace collapsing.
5. Collapse Whitespace: Replace any sequence of one or more ASCII
whitespace characters (U+0020 SPACE, U+0009 TAB, U+000A LINE
FEED, U+000D CARRIAGE RETURN) with a single ASCII SPACE (U+0020)
6. Trim Whitespace: Remove any leading or trailing ASCII SPACE
characters
7. Compute Hash: Compute the SHA-256 hash over the resulting string
(encoded as UTF-8)
The strong ETag MUST be "sha256-<64 lowercase ASCII hex chars>" from
this hash. The sitemap contentHash MUST be sha256-<64 lowercase
ASCII hex chars> from the same hash (without the W/ prefix and
quotes).
Jurkovikj Expires 8 May 2026 [Page 14]
Internet-Draft Collab-Tunnel November 2025
Example (pseudocode):
function generateFingerprint(contentString):
normalized = contentString
.decodeEntities()
.unicodeNormalize('NFKC')
.casefold()
.removeControlChars()
.collapseWhitespace()
.trim()
return "sha256-" + sha256(normalized)
Note: This normalization operates on the plain-text content field in
the JSON payload, not on HTML from the C-URL.
Normalization uses Unicode NFKC [UAX15] and Unicode case folding
[Unicode-CaseFolding]; named character references are decoded per the
HTML Living Standard [WHATWG-HTML].
5.3. Deterministic JSON Serialization (Normative)
Implementations MUST use deterministic JSON serialization when
generating M-URL responses to ensure that identical inputs yield
byte-identical JSON.
*Required Properties:*
* *Stable object key order:* Lexicographic ordering by Unicode
codepoint at every depth
* *Preserve array order:* Arrays MUST maintain element order as
defined
* *UTF-8 encoding:* Without BOM; emit exactly one JSON document; no
trailing newline
* *Compact serialization:* No pretty-print; no extraneous whitespace
* *Consistent escaping per [RFC8259]:*
- Escape quotation mark (U+0022) and backslash (U+005C)
- Do not escape solidus (U+002F)
- Emit Unicode as UTF-8; use \uXXXX only where required by
[RFC8259]
Jurkovikj Expires 8 May 2026 [Page 15]
Internet-Draft Collab-Tunnel November 2025
* *Numbers:* MUST be in minimal canonical form (no leading zeros, no
"+", lowercase "e")
* *Non-finite numbers:* JSON numbers MUST NOT represent NaN or
infinite values. Producers MUST NOT emit non-finite values;
consumers encountering them MUST treat the payload as invalid.
Values that cannot be portably represented SHOULD be encoded as
strings with schema guidance
*Deterministic field inclusion:*
The set of fields included and their values for a given resource MUST
be deterministic. Fields that vary independently of the normalized
content MUST NOT be included unless they are a deterministic function
of that content.
*Recommendation:*
Implementations SHOULD use [RFC8785] (JSON Canonicalization Scheme)
or document an equivalent deterministic serialization profile to
ensure cross-platform consistency.
5.4. Strong ETag and Parity (Normative)
Servers emitting strong validators MUST ensure that the ETag value
changes whenever the final serialized JSON payload bytes change;
equality of strong ETag values MUST imply byte-identical
representations.
The 64 hexadecimal digits in the hash value MUST be lowercase ASCII.
ETag values MUST be sent as a quoted-string per [RFC9110].
5.4.1. Method A: Canonical JSON Strong-Byte (Recommended)
*Computation:*
1. Build the JSON response object WITHOUT the hash field
2. Canonicalize the JSON per the deterministic serialization
requirements above
3. Compute F = SHA-256(canonical_json_bytes) as 64 hexadecimal
characters
4. Set the hash value: hash_value = "sha256-" + F
5. Add the hash field to the JSON payload: payload.hash = hash_value
Jurkovikj Expires 8 May 2026 [Page 16]
Internet-Draft Collab-Tunnel November 2025
6. Set HTTP headers:
* ETag: "sha256-" + F
* Sitemap: contentHash: "sha256-" + F
7. Servers MUST canonicalize the final payload (now including the
hash field) before sending, using the same deterministic
serialization profile
*Rationale:*
This method guarantees strong ETag semantics even as the protocol
evolves to add new fields. Any change to the JSON representation
correctly changes the ETag, ensuring byte-identical validation per
[RFC9110].
Computing the ETag over the canonical form of the payload without the
hash field still satisfies strong validator semantics because the
final payload bytes are a deterministic function of that canonical
pre-hash payload and the ETag value.
Servers SHOULD compute F over identity-coded (uncompressed) canonical
bytes and MAY reuse the same ETag across compressed variants; servers
MUST set Vary: Accept-Encoding.
*Example:*
json_without_hash = {
"profile": "tct-1",
"canonical_url": "https://example.com/post/",
"title": "Article Title",
"content": "Normalized content text..."
}
canonical_bytes = canonicalize_json(json_without_hash)
F = sha256(canonical_bytes).hexdigest() // 64 hex chars
hash_value = "sha256-" + F
json_with_hash = json_without_hash
json_with_hash["hash"] = hash_value
response.setHeader("ETag", '"' + hash_value + '"')
response.send(json_with_hash)
Jurkovikj Expires 8 May 2026 [Page 17]
Internet-Draft Collab-Tunnel November 2025
5.4.2. Method B: Content-Locked Strong-Content (Allowed with
Restrictions)
*Computation:*
1. Extract and normalize content per the 6-step normalization
algorithm
2. Compute F = SHA-256(normalized_content_utf8_bytes) as 64
hexadecimal characters
3. Set the hash value: hash_value = "sha256-" + F
4. Build the JSON payload deterministically from the normalized
content
5. Set HTTP headers:
* ETag: "sha256-" + F
* Sitemap: contentHash: "sha256-" + F
* JSON payload: hash: "sha256-" + F
*Restrictions:*
This method is ONLY valid if:
* The ENTIRE JSON representation is a deterministic function of the
normalized content and fixed protocol constants
* NO field may vary independently of the normalized content
* Adding or changing any field REQUIRES recomputing the hash from
updated content
*Rationale:*
If the final JSON bytes are strictly determined by content, then
content-hashing produces the same result as JSON-hashing. This
preserves template-invariance: same content text produces the same
hash regardless of HTML/theme presentation.
*Caution:*
Jurkovikj Expires 8 May 2026 [Page 18]
Internet-Draft Collab-Tunnel November 2025
Future protocol versions that add metadata fields (e.g., language,
author, published_date) independent of content text would violate
strong semantics with this method. Such deployments MUST migrate to
Method A.
5.5. Parity Rule (Normative)
Sitemap contentHash, JSON payload hash field, and M-URL ETag value
MUST satisfy:
contentHash == clean(ETag) == payload.hash
Where clean(ETag) removes surrounding quotes from the ETag header
value.
*Example:*
* HTTP Header: ETag: "sha256-2c26b46b68ffc68f..."
* Sitemap: contentHash: sha256-2c26b46b68ffc68f...
* JSON Payload: "hash": "sha256-2c26b46b68ffc68f..."
*Verification:*
Clients MUST verify parity through string equality and MUST NOT
recompute hashes from HTML. Clients that detect parity violations
SHOULD log a warning and MAY reject the response.
5.6. Template-Invariance
TCT's template-invariance property means that HTML presentation
changes (theme updates, CSS/JavaScript modifications, navigation
restructuring) do not affect the protocol's hash values, provided the
core content (title + body text) remains unchanged.
*With Method A (Canonical JSON Strong-Byte):*
* HTML changes -> No effect on normalized content -> No effect on
JSON -> ETag unchanged OK
* JSON field addition/change -> JSON bytes change -> ETag changes OK
* Result: Template-invariance preserved AND strong ETag semantics
correct
*With Method B (Content-Locked Strong-Content):*
Jurkovikj Expires 8 May 2026 [Page 19]
Internet-Draft Collab-Tunnel November 2025
* HTML changes -> No effect on normalized content -> ETag unchanged
OK
* Content changes -> Hash changes -> ETag changes OK
* Independent JSON field changes -> FAIL Would violate strong
semantics (not allowed)
*Summary:*
Template-invariance addresses HTML/theme independence. Strong ETag
semantics address JSON byte-identity. Both properties are compatible
when JSON is deterministic.
6. Conditional Request Discipline
6.1. If-None-Match Precedence
When both If-None-Match and If-Modified-Since headers are present,
servers MUST give If-None-Match precedence per [RFC9110],
Section 13.1.2. This means:
1. Evaluate If-None-Match first
2. If ETag matches, return 304 Not Modified (ignore If-Modified-
Since)
3. If ETag doesn't match, process If-Modified-Since (if present)
*Rationale:* ETags provide stronger validation than modification
dates, especially for semantic fingerprints.
6.2. 304 Not Modified Response
When the ETag matches the If-None-Match value:
1. Server MUST respond with 304 Not Modified
2. Response MUST NOT include a message body. Servers MUST include
ETag if the corresponding 200 OK would include it; otherwise
SHOULD include ETag. Servers SHOULD include Cache-Control (per
[RFC9111]) and MAY include Last-Modified
*Example:*
Jurkovikj Expires 8 May 2026 [Page 20]
Internet-Draft Collab-Tunnel November 2025
HTTP/1.1 304 Not Modified
ETag: "sha256-2c26b46b68ffc68f..."
Last-Modified: Wed, 21 Oct 2025 07:28:00 GMT
Cache-Control: max-age=0, must-revalidate
Cache-Control: stale-while-revalidate=60
Cache-Control: stale-if-error=86400
Vary: Accept-Encoding
6.3. Replay and Stale Intermediaries
If an agent receives a 200 whose ETag is older than its cached parity
while the sitemap contentHash matches the cached parity, it SHOULD
treat the cached/sitemap state as authoritative unless the server
signals otherwise (e.g., newer Last-Modified or sitemap value). The
agent SHOULD revalidate end-to-end (e.g., Cache-Control: no-cache +
If-None-Match) before adopting a regression.
6.4. Cache-Control Directives
M-URL responses SHOULD use:
Cache-Control: max-age=0, must-revalidate
Cache-Control: stale-while-revalidate=60
Cache-Control: stale-if-error=86400
*Directives:* - max-age=0: Require revalidation before serving from
cache - must-revalidate: Do not serve stale without successful
revalidation - stale-while-revalidate=60: Serve stale while
revalidating in background (60s window) - stale-if-error=86400: Serve
stale if origin unavailable (24h window)
Note: Avoid private on M-URLs and sitemaps (they are cacheable by
shared caches).
6.5. Vary Header
Responses SHOULD include Vary: Accept-Encoding to indicate
compression variance:
Vary: Accept-Encoding
M-URL responses primarily vary by compression (gzip, brotli), not by
content type (always application/json).
Servers SHOULD NOT vary on User-Agent for M-URLs and sitemaps to
preserve cacheability and reduce fragmentation.
Jurkovikj Expires 8 May 2026 [Page 21]
Internet-Draft Collab-Tunnel November 2025
6.6. HEAD Request Support
Servers SHOULD support HEAD requests for all M-URLs and sitemaps.
HEAD responses MUST: - Return same HTTP headers as equivalent GET
request - NOT include a message body - Include all validators (ETag,
Last-Modified, Cache-Control)
*Example:*
HEAD /post/llm/ HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
ETag: "sha256-abc123..."
Last-Modified: Wed, 21 Oct 2025 07:28:00 GMT
Cache-Control: max-age=0, must-revalidate, stale-while-revalidate=60
Vary: Accept-Encoding
Content-Type: application/json
Content-Length: 1234
This enables efficient validation without transferring the full
response body.
7. Sitemap-First Verification
7.1. JSON Sitemap Format
Publishers SHOULD provide a machine-readable sitemap at a well-known
location (e.g., /llm-sitemap.json).
Publishers MAY also include a Sitemap: directive in robots.txt
pointing to the JSON sitemap; agents MAY use it as a discovery hint.
If both XML and JSON sitemaps are present, agents that implement this
protocol SHOULD prefer the JSON sitemap for TCT.
*Schema:*
Jurkovikj Expires 8 May 2026 [Page 22]
Internet-Draft Collab-Tunnel November 2025
{
"version": 1,
"profile": "tct-1",
"items": [
{
"cUrl": "https://example.com/post/",
"mUrl": "https://example.com/post/llm/",
"modified": "2025-10-01T12:34:56Z",
"contentHash": "sha256-2c26b46b68ffc68f..."
}
]
}
*Fields:*
* version (integer): Sitemap format version (currently 1)
* profile (string, RECOMMENDED): Protocol version identifier (e.g.,
"tct-1"). Enables clients to detect protocol capabilities and
maintain forward compatibility as the specification evolves.
* items (array): List of URL pairs
- cUrl (string, required): Canonical URL
- mUrl (string, required): Machine URL
- modified (string, [RFC3339]): Last modification timestamp
- contentHash (string, required): Template-invariant fingerprint
(same as M-URL ETag)
*Parity Rule:*
The sitemap contentHash value MUST match the M-URL ETag header value,
excluding quotes. This enables zero-fetch skip optimization: clients
compare sitemap hash to cached ETag without fetching the M-URL.
*Forward Compatibility:*
Clients MUST ignore unknown fields in the sitemap JSON. Servers MAY
add additional fields to support future protocol versions. Agents
SHOULD read the profile field when present; unknown profile values
SHOULD NOT cause ingestion failure but MAY be logged for analysis.
Jurkovikj Expires 8 May 2026 [Page 23]
Internet-Draft Collab-Tunnel November 2025
7.2. Sitemap Scalability
Publishers and agents SHOULD consider scalability for large sites:
* Publishers MAY split sitemaps into an index (a sitemap-of-
sitemaps) to segment large URL sets (analogous to XML Sitemaps
sitemapindex)
* Servers SHOULD support compression (e.g., Content-Encoding: gzip
or br) for sitemap responses; agents SHOULD accept compressed
responses
* Agents SHOULD use streaming JSON parsers for large sitemaps and
enforce a maximum sitemap size (e.g., 100 MB)
* Servers SHOULD keep per-item objects compact (RECOMMENDED <= 2
KB). Servers MUST bound total sitemap size by an operator-
configured budget. When budgets would be exceeded (RECOMMENDED <=
50,000 items per sitemap), servers SHOULD publish a sitemap index
and split content
*Sitemap Index (Large Sites):*
For sites with thousands of URLs, publishers SHOULD segment sitemaps
using a sitemap index (analogous to XML Sitemaps sitemapindex). A
formal JSON sitemap index schema MAY be specified in future protocol
versions. Agents SHOULD support consuming multiple sitemap files.
*Homepage Handling:*
Publishers SHOULD include the site homepage as the *first item* in
the sitemap array. This provides automated agents with immediate
access to site-level context (site name, description, purpose) before
processing individual content pages.
For homepages that display dynamic content listings (blog roll,
latest posts), publishers MAY synthesize stable content representing
the site overview rather than the dynamic list.
*Example with homepage first:*
Jurkovikj Expires 8 May 2026 [Page 24]
Internet-Draft Collab-Tunnel November 2025
{
"version": 1,
"profile": "tct-1",
"items": [
{
"cUrl": "https://example.com/",
"mUrl": "https://example.com/llm/",
"modified": "2025-10-15T08:00:00Z",
"contentHash": "sha256-abc123..."
},
{
"cUrl": "https://example.com/about/",
"mUrl": "https://example.com/about/llm/",
"modified": "2025-10-01T10:00:00Z",
"contentHash": "sha256-def456..."
}
]
}
*Sitemap HTTP Response:*
GET /llm-sitemap.json HTTP/1.1
Host: example.com
HTTP/1.1 200 OK
Content-Type: application/json
ETag: "sha256-sitemap-fingerprint"
Last-Modified: Wed, 21 Oct 2025 12:00:00 GMT
Cache-Control: max-age=0, must-revalidate, stale-while-revalidate=60
Vary: Accept-Encoding
Content-Length: 4567
{
"version": 1,
"profile": "tct-1",
"items": [...]
}
*Conditional Sitemap Fetch:*
GET /llm-sitemap.json HTTP/1.1
Host: example.com
If-None-Match: "sha256-sitemap-fingerprint"
HTTP/1.1 304 Not Modified
ETag: "sha256-sitemap-fingerprint"
Cache-Control: max-age=0, must-revalidate, stale-while-revalidate=60
Jurkovikj Expires 8 May 2026 [Page 25]
Internet-Draft Collab-Tunnel November 2025
Clients SHOULD use conditional requests for sitemap to avoid
unnecessary bandwidth when sitemap unchanged.
7.3. Error Handling (Informative)
*Malformed sitemap:* Agents SHOULD treat this as non-fatal; skip
entries that fail structural validation; log and continue.
*ETag/contentHash mismatch:* Agents SHOULD treat the endpoint ETag as
authoritative, process the change, and update caches. Publishers
SHOULD publish sitemaps atomically or include Last-Modified.
*M-URL unavailable:* Agents MAY defer the fetch or fall back to the
C-URL HTML as a last resort; publishers SHOULD return a 4xx/5xx
rather than an empty 200.
*HTTP Status Code Handling:*
Agents SHOULD respect common HTTP status codes for retries and
backoff. If a server responds with 410 Gone, the agent SHOULD treat
the resource as permanently deleted. If a server responds with 429
Too Many Requests or 503 Service Unavailable, the agent SHOULD honor
the Retry-After header if present.
7.4. Zero-Fetch Skip Logic
Automated agents SHOULD:
1. Fetch /llm-sitemap.json periodically
2. Compare contentHash values to locally cached hashes
3. *If hash unchanged*: Skip fetching both C-URL and M-URL (zero-
fetch optimization)
4. *If hash changed*: Issue conditional GET to M-URL with If-None-
Match
This enables 90%+ skip rate for unchanged content.
*Example workflow:*
Jurkovikj Expires 8 May 2026 [Page 26]
Internet-Draft Collab-Tunnel November 2025
Agent: Fetch /llm-sitemap.json
Agent: item.contentHash = "sha256-abc123..."
Agent: cachedHash = lookup(item.mUrl)
if (item.contentHash === cachedHash):
// Zero-fetch: Content unchanged, skip all requests
skip()
else:
// Hash changed, fetch with conditional request
GET item.mUrl
Headers: If-None-Match: "sha256-abc123..."
if (response.status == 304):
// Still matched at endpoint, update cache
cache(item.mUrl, item.contentHash)
else:
// Content changed, process new data
process(response.body)
cache(item.mUrl, response.headers['ETag'])
8. Publisher Policy Descriptor
This section is informative.
Publishers MAY provide a machine-readable policy descriptor at a
well-known location (e.g., /llm-policy.json or /.well-known/llm-
policy.json) to communicate usage terms, rate limits, and content
licensing preferences to automated agents. Example paths are non-
normative.
8.1. Policy Endpoint
The policy descriptor SHOULD be available at a stable URL. Example
paths (non-normative):
https://example.com/llm-policy.json
https://example.com/.well-known/llm-policy.json
8.2. JSON Schema
Jurkovikj Expires 8 May 2026 [Page 27]
Internet-Draft Collab-Tunnel November 2025
{
"profile": "tct-policy-1",
"version": 1,
"effective": "2025-10-01T00:00:00Z",
"updated": "2025-10-15T12:00:00Z",
"policy_urls": {
"terms_of_service": "https://example.com/terms/",
"payment_info": "https://example.com/pricing/",
"contact": "https://example.com/contact/"
},
"purposes": {
"allow_ai_input": true,
"allow_ai_train": false,
"allow_search_indexing": true
},
"requirements": {
"attribution_required": true,
"link_back_required": false,
"notice_required": true
},
"rate_hints": {
"max_requests_per_second": null,
"max_requests_per_day": 10000,
"note": "Advisory limits, honor system"
}
}
*Fields:*
* profile (string): Policy schema version identifier (e.g., "tct-
policy-1")
* version (integer): Policy revision number
* effective (string, RFC 3339): When policy took effect
* updated (string, RFC 3339): Last policy modification
* policy_urls (object): URLs to human-readable policy documents
- terms_of_service: Legal terms URL
- payment_info: Pricing/billing information URL (for paid access)
Jurkovikj Expires 8 May 2026 [Page 28]
Internet-Draft Collab-Tunnel November 2025
- contact: Publisher contact for licensing inquiries
* purposes (object): Usage permissions
- allow_ai_input: Content may be used as AI input (RAG, context)
- allow_ai_train: Content may be used for model training
- allow_search_indexing: Content may be indexed for search
* requirements (object): Usage conditions
- attribution_required: Must credit publisher when using content
- link_back_required: Must link to canonical URL when
republishing
- notice_required: Must notify publisher of commercial use
* rate_hints (object): Advisory crawl rate limits (non-binding)
- max_requests_per_second: Requests per second limit (null = no
limit)
- max_requests_per_day: Requests per day limit
- note: Additional guidance
Rate hints are advisory only; enforcement, payment, and economic
arrangements are out of scope for this specification. Vocabulary
alignment with IETF AIPREF is expected as that work matures.
8.3. Discovery
The policy descriptor SHOULD be linked from the sitemap with a
describedby Link header (example paths only):
Link: ; rel="describedby"; type="application/json"
Automated agents SHOULD:
1. Fetch /llm-policy.json before crawling
2. Honor stated usage restrictions
3. Respect rate hints to avoid overwhelming origin
4. Review terms before commercial use
Jurkovikj Expires 8 May 2026 [Page 29]
Internet-Draft Collab-Tunnel November 2025
This specification uses registered relations (alternate/index/
describedby). A dedicated relation for TCT sitemaps might be
registered in the future; this document does not create new link
relations.
8.4. Alignment with IETF AIPREF
This policy format is designed to complement the IETF AIPREF (AI
Preferences) proposal, providing machine-readable expressions of
publisher preferences for automated agent behavior.
9. M-URL Response Format
9.1. Content-Type
M-URL responses MUST set Content-Type: application/json:
Content-Type: application/json; charset=utf-8
9.2. JSON Payload Schema
*Minimal required fields:*
{
"profile": "tct-1",
"canonical_url": "https://example.com/post/",
"title": "Article Title",
"content": "Core article content...",
"hash": "sha256-2c26b46b68ffc68f..."
}
*Fields:*
* profile (string, RECOMMENDED): Protocol version identifier (e.g.,
"tct-1"). Future versions (e.g., "tct-2") can introduce new
fields while maintaining backward compatibility.
* canonical_url (string, required): The C-URL for this resource
* title (string, required): Resource title
* content (string, required): Plain-text core content (UTF-8). This
field is the input to the normalization algorithm. The content
field MUST NOT contain HTML tags or markup. Publishers MUST strip
or escape any embedded HTML before inclusion. To ensure
fingerprint stability, publishers SHOULD avoid including volatile,
non-semantic elements (e.g., dynamic view counts, timestamps) in
this string. Publishers SHOULD provide content independent of
Jurkovikj Expires 8 May 2026 [Page 30]
Internet-Draft Collab-Tunnel November 2025
template/theme presentation. Publishers MAY include semantic
metadata (such as title or media captions) to create a more
complete fingerprint.
* hash (string, required): Template-invariant fingerprint. MUST
equal the M-URL ETag value excluding quotes. This is the same
value as the sitemap contentHash field. Format: sha256-<64 hex>.
Example: if M-URL ETag is "sha256-abc123", then hash is
sha256-abc123 and sitemap contentHash is sha256-abc123.
*Forward Compatibility:*
Clients MUST ignore unknown fields in the JSON payload. Servers MAY
add additional fields to support future protocol versions or domain-
specific metadata. Agents SHOULD read the profile field when
present; unknown values SHOULD NOT cause ingestion failure but MAY be
logged.
*Extended fields example:*
{
"profile": "tct-1",
"canonical_url": "https://example.com/post/",
"title": "Article Title",
"language": "en-US",
"published": "2025-10-01T10:00:00Z",
"modified": "2025-10-15T14:30:00Z",
"content": "Core article content...",
"hash": "sha256-2c26b46b68ffc68f...",
"structured_data": {
"@context": "https://schema.org",
"@type": "Article",
"headline": "Article Title"
}
}
9.3. Complete Response Example
Jurkovikj Expires 8 May 2026 [Page 31]
Internet-Draft Collab-Tunnel November 2025
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
ETag: "sha256-2c26b46b68ffc68f..."
Link: ; rel="canonical"
Last-Modified: Wed, 15 Oct 2025 14:30:00 GMT
Cache-Control: max-age=0, must-revalidate
Cache-Control: stale-while-revalidate=60
Cache-Control: stale-if-error=86400
Vary: Accept-Encoding
Content-Length: 1234
{
"profile": "tct-1",
"canonical_url": "https://example.com/post/",
"title": "Understanding the Collaboration Tunnel Protocol",
"language": "en",
"published": "2025-10-01T10:00:00Z",
"modified": "2025-10-15T14:30:00Z",
"content": "The Collaboration Tunnel Protocol enables...",
"hash": "sha256-2c26b46b68ffc68f..."
}
10. Operational Considerations (Informative)
This section provides non-normative operational guidance for
deployers and automated agents.
*Migration and Path Collisions* - If a preferred slug (e.g., /llm/)
collides with existing routes, choose an alternate (e.g., /api/llm/,
/content/llm/) and publish a 308 Permanent Redirect from legacy to
primary endpoints - List only the primary M-URL in the sitemap to
avoid duplication; agents SHOULD follow redirects but rely on sitemap
entries for canonical endpoint discovery - Ensure robots.txt permits
the chosen machine paths as appropriate for your policy
*Caching and CDNs* - Configure intermediaries to honor validators and
serve 304 responses; avoid private on M-URLs and sitemaps to enable
shared caching - Enable compression (gzip or br) for sitemaps and
JSON payloads
*Discovery and Verification* - Prefer discovery via HTML , HTTP Link headers, or
sitemap entries; agents MUST NOT assume a fixed path - Agents SHOULD
verify the M-URL's canonical link header before processing and skip
ingestion if missing or mismatched
Jurkovikj Expires 8 May 2026 [Page 32]
Internet-Draft Collab-Tunnel November 2025
*Network and WAFs* - Allow the HEAD method (some WAFs block by
default) to enable efficient validation - If using an edge worker/
proxy, ensure required headers (Link, ETag, Cache-Control, Vary) are
preserved or injected consistently
11. Security Considerations
11.1. HTTPS and TLS
M-URLs and sitemaps MUST be served over HTTPS. Agents MUST validate
TLS certificates using platform trust stores and MUST reject invalid
certificates.
11.2. Rate Limiting
Servers MAY expose RateLimit fields ([RFC9448]); agents SHOULD
respect them.
11.3. Content Integrity
The template-invariant fingerprint (SHA-256) provides: - *Tamper
detection*: Agents can verify content integrity - *Cache validation*:
Ensures served content matches expected hash
However, SHA-256 alone does NOT provide: - *Authentication*: Does not
prove publisher identity - *Non-repudiation*: Publisher can deny
serving content
For authenticated content delivery, publishers MAY implement digital
signatures (outside scope of this specification).
11.4. Privacy
Sitemap exposure MAY reveal: - *Content inventory*: All published
URLs - *Modification patterns*: Publishing/update frequency
Publishers SHOULD: - Apply access controls if sitemaps contain
sensitive URLs - Use robots.txt to restrict crawler access if needed
The content field may contain sensitive or personally identifiable
information (PII). Publishers MUST apply access controls to M-URLs
equivalent to those applied to the corresponding C-URLs, and to any
sitemaps listing protected resources (see "Protected Sitemaps" in
MUST Requirements).
For general privacy considerations related to Internet protocols, see
[RFC6973].
Jurkovikj Expires 8 May 2026 [Page 33]
Internet-Draft Collab-Tunnel November 2025
11.5. Denial of Service
11.5.1. Sitemap Abuse
Large sitemaps MAY be used for DoS attacks. Agents SHOULD: -
Implement request rate limiting - Set maximum sitemap size limits
(e.g., 100 MB) - Use streaming JSON parsers for large sitemaps
11.5.2. HEAD vs GET Bandwidth
HEAD requests provide DoS mitigation by enabling validation without
body transfer: - HEAD request: ~500 bytes (headers only) - GET
request: 1-100 KB (full JSON body)
Agents SHOULD use HEAD for validation before GET.
11.6. Injection Surface Reduction
M-URL JSON responses contain only structured text, reducing injection
attack surface compared to HTML: - No