Internet-Draft | Web bot auth Glossary | April 2025 |
Meunier | Expires 30 October 2025 | [Page] |
Automated traffic authentication presents unique security challenges, constraints, and opportunities that impact all Internet users. This document seeks to collect terminology and examples within the space, with a specific focus on AI related technologies.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://thibmeu.github.io/draft-meunier-glossary/draft-meunier-web-bot-auth-glossary.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-glossary/.¶
Source for this draft and an issue tracker can be found at https://github.com/thibmeu/draft-meunier-glossary-somehow.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 30 October 2025.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Agents are increasingly used in business and user workflows, including AI assistants, search indexing, content aggregation, and automated testing. These agents need to reliably identify themselves to origins for several reasons:¶
Regulatory compliance requiring transparency of automated systems¶
Origin resource management and access control¶
Protection against impersonation and reputation management¶
Service level differentiation between human and automated traffic¶
Current identification methods such as IP allow-listing, User-Agent strings, or shared API keys have significant limitations in security, scalability, manageability, and fairness. This document presents these examples, as well as possible paths to address them.¶
There is an increase in agent traffic on the Internet. Many agents choose to identify their traffic today via lists of IP Addresses and/or unique User-Agents. This is often done to demonstrate trust and safety claims, support allow-listing/deny-listing the traffic in a granular manner, and enable sites to monitor and rate limit per agent operator. However, these mechanisms have drawbacks:¶
User-Agent, when used alone, can be spoofed meaning anyone may attempt to act as that agent. It is also overloaded - an agent may be using Chromium and wish to present itself as such to ensure rendering works, yet it still wants to differentiate its traffic to the site.¶
IP blocks alone can present a confusing story. IPs on cloud plaforms have layers of ownership - the platform owns the IP and registers it in their published IP blocks, only to be re-published by the agent with little to bind the publication to the actual service provider that may be renting infra. Purchasing dedicated IP blocks is expensive, time consuming, and requires significant specialist knowledge to set up. These IP blocks may have prior reputation history that needs to be carefully inspected and managed before purchase and use.¶
An agent may go to every website on the Internet and share a secret with them like a Bearer from [OAUTH-BEARER-RFC]. This is impractical to scale for any agent beyond select partnerships, and insecure, as key rotation is challenging and becomes less secure as the consumers scale.¶
Using well-established cryptography, we can instead define a simple and secure mechanism that empowers small and large agents to share their identity.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
An autonomous entity that perceives the environment and can take actions on behalf of users.¶
A type of agent that operates automatically, often performing repetitive tasks. Bots may identify themselves or attempt to mimic human behavior.¶
The primary server hosting the web content or service that an agent intends to access.¶
Controls incoming traffic to an origin based on a set of rules. This may include but is not limited to IP filtering, User-Agent matching, or cryptographic signature verification.¶
An intermediary server that forwards client requests to the origin server, often performing functions like load balancing, authentication, or caching.¶
A client application used to access web content. Browsers may also be orchestrated.¶
A physical person, like you and me.¶
A control mechanism that restricts the access of an Agent to a resource provided by an Origin Server. An Origin can decide to rate limit all connections from an individual Client, from a specific Provider, or to a specific resource. This may be a fixed number of requests, a budget, a time, a location, or legal requirements.¶
A property ensuring that multiple interactions or credentials from the same agent cannot be correlated by the verifier.¶
Persistent identifier of an entity to an origin. This requires a registration.¶
The creation of an identity. It can involve one time payment, a subscription, an account with user name/password, an age, a legal jurisdiction, others.¶
An entity that generates and provides credentials to agents after the Attester has verified certain attributes.¶
An entity that evaluates an agent's characteristics or behavior and provides evidence to an Issuer to support credential issuance.¶
An entity that validates the authenticity and integrity of a credential presented by an agent.¶
We divide web bot authentication in three categories.¶
Organizations operating bots may need to authenticate their agents to access certain web resources. Authentication mechanisms can help distinguish legitimate bots from malicious ones.¶
Examples:¶
Bots acting on behalf of registered users may require authentication to access user-specific data or services.¶
Examples:¶
In scenarios where full identification is unnecessary or undesirable, agents may present credentials that attest to specific attributes without revealing their identity.¶
Examples:¶
Add a signal to limit visual CAPTCHA challenge such as [PRIVATE-ACCESS-TOKEN],¶
Gating access to a resource for longstanding users such as [LOX],¶
Using a search engine with a fixed number of requests such as [PRIVACY-PASS-KAGI],¶
Selective disclosure of a credential attribute (location, age) such as [PRIVATE-PROOF-API].¶
The ecosystem involves multiple actors: a credential issuer that requires an certain criteria to be passed via an attester, the client which can be a bot or human-mediated agent whose IP is unknown, and the web origin placed behind a reverse proxy that may be fronting its infrastructure. The issuer provides cryptographic credentials to the client, which are then linked to requests and optionally verified by proxies before reaching the origin. This chain allows for authentication without necessarily revealing identifying details to each intermediate.¶
Humans and bots often interact with origins indirectly via clients such as browsers, agents, or CLI tools. These clients handle requests, potentially traversing reverse proxies that manage TLS termination, DDoS protection, and caching.¶
The rise of advanced browser orchestration blurs the line between human-driven and automated requests, making identifying traffic as automated or not increasingly ambiguous.¶
The attester/issuer roles could be filled by the AI company, reverse proxy, origin, or a third party. Origins need mechanisms to identify organizations, rate-limit individuals, and authenticate users without relying solely on client IP or heuristics presented in Section 2.¶
The security model includes several actors: credential issuers, attesters, clients (bots or agents), reverse proxies, and origin servers. The primary goals are to prevent impersonation, allow for credential revocation, support delegation and rotation, and maintain trust boundaries.¶
If the Issuer is also the Origin or its reverse proxy, it is possible to use shared secrets for verification. In cases where the issuer and verifier are different entities, asymmetric cryptography becomes necessary, allowing the bot to prove its identity using a public key infrastructure.¶
Some credentials may be designed for one-time use only (for anti replay or privacy reasons), while others can support multiple presentations through the use of cryptographic derivation techniques. This distinction affects privacy, scalability, and implementation complexity.¶
Authentication tokens may be exchanged at different protocol layers and through different transports. Each may have different deployment, performance, and security guarantees.¶
For TLS, we have seen [REQ-MTLS] and [PRIVACYPASS-IN-TLS] respectively addressing Section 4.1 and Section 4.3.¶
For HTTP, we see [HTTP-MESSAGE-SIGNATURE-FOR-BOTS] or [DPOP-AUTH-RFC], and [PRIVACYPASS-HTTP-AUTH-RFC] respectively addressing Section 4.1 and Section 4.3. [OAUTH-BEARER-RFC] fits as well for Section 4.2.¶
Other methods have been seen such as leveraging a dedicated format on top of a JavaScript API. This is the case for W3C [PRIVATE-STATE-TOKEN] or the more recent [PRIVATE-PROOF-API].¶
Focusing on AI specifically, it's worth mentioning two proponent protocol definition efforts:¶
[A2A-AUTH] which follows [OPENAPI3-AUTH]. This means it allows for Basic, Bearer, API Keys, and [OAUTH2-RFC]. OpenAPI mentions using the [HTTP-AUTHSCHEME] registry, but there does not seem to be a definition for recent schemes such as [PRIVACYPASS-HTTP-AUTH-RFC], [CONCEALED-AUTH-RFC], or [DPOP-AUTH-RFC].¶
[MCP-AUTH] uses [OAUTH2-RFC] as a resource server.¶
Protocols should strive to minimise the number of round trips between a client and the issuer, and between clients and the origin.¶
Just as there are registries to resolve IP address metadata, there are going to be registries to identify the owner of public key material. These are mentioned by [A2A-DISCOVERY] and [MCP-DISCOVERY].¶
The primary goal of these catalogs is to associate metadata with a public key, and the discovery of the associated metadata. They SHOULD have some sort of tamper resistance, to prevent the provider of a catalog providing incorrect information.¶
As an analogy, one can think of [CERTIFICATE-TRANSPARENCY-RFC], or the more recent effort in [KEY-TRANSPARENCY-ARCHITECTURE].¶
Submission is also going to happen out-of-band. This is both for a practical reason, it is simpler than setting up a catalog, and for privacy reasons, given you don't have to expose information through a catalog.¶
Discovery may happen on-path, that is when a request arrives from a client to an origin. This could be considered a form of trust-on-first-use. While the level of trust is low, it could be viable for certain use cases.¶
Such discovery could be via an HTTP header containing a domain name with a well-known, a URL, a certificate, etc.¶
This glossary provides terminology for web bot authentication. While this document does not define or recommend specific protocols, terminology choices have direct security implications:¶
Clearly defined roles are essential for preventing entities from falsely claiming identities.¶
Definitions such as Section 6.2 help describe key mechanisms that mitigate the misuse of credentials if stolen.¶
Section 7 is key to protocol security, and has to be considered.¶
In addition, protocols should consider decentralization [RFC9518] and end-user impact [RFC8890].¶
Authentication mechanisms should minimize the collection and exposure of personal data. Techniques like selective disclosure and unlinkability help protect user privacy. Protocols should refer to [RFC6973].¶
Multiple protocols are also likely to be used in coordination: to identify an orgnization, then to identify the User-Agent, and possibly rate limit. It is important to consider the privacy of these layers together as well.¶
This document has no IANA actions.¶
TODO acknowledge.¶