Skip to content

Commit f8a6998

Browse files
committed
Start draft on declarative explainer
Fill more things out
1 parent ee1a39b commit f8a6998

File tree

3 files changed

+132
-14
lines changed

3 files changed

+132
-14
lines changed

README.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,9 @@
22

33
_Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows._
44

5-
> First published August 13, 2025
6-
>
7-
> Brandon Walderman <code>&lt;brwalder@microsoft.com&gt;</code><br>
8-
> Leo Lee <code>&lt;leo.lee@microsoft.com&gt;</code><br>
9-
> Andrew Nolan <code>&lt;annolan@microsoft.com&gt;</code><br>
10-
> David Bokan <code>&lt;bokan@google.com&gt;</code><br>
11-
> Khushal Sagar <code>&lt;khushalsagar@google.com&gt;</code><br>
12-
> Hannah Van Opstal <code>&lt;hvanopstal@google.com&gt;</code>
13-
145
## TL;DR
156

16-
We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.
7+
We propose a new JavaScript API that allows web developers to expose their web application functionality as "tools" JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.
178

189
For the technical details of the proposal, code examples, API shape, etc. see [proposal.md](./docs/proposal.md).
1910

@@ -613,6 +604,16 @@ Some tools that a web app may want to provide for agents and assistive technolog
613604
614605
For scenarios like this, it may be helpful to combine tool call handling with something like the ['launch'](https://github.com/WICG/web-app-launch/blob/main/sw_launch_event.md) event. A client application might attach a tool call to a "launch" request which is handled entirely in a service worker without spawning a browser window.
615606
616-
## Acknowledgments
607+
## Authors & acknowledgments
608+
609+
This explainer was first published on August 13, 2025, by
610+
>
611+
> Brandon Walderman <code>&lt;brwalder@microsoft.com&gt;</code><br>
612+
> Leo Lee <code>&lt;leo.lee@microsoft.com&gt;</code><br>
613+
> Andrew Nolan <code>&lt;annolan@microsoft.com&gt;</code><br>
614+
> David Bokan <code>&lt;bokan@google.com&gt;</code><br>
615+
> Khushal Sagar <code>&lt;khushalsagar@google.com&gt;</code><br>
616+
> Hannah Van Opstal <code>&lt;hvanopstal@google.com&gt;</code>
617+
617618
618619
Many thanks to [Alex Nahas](https://github.com/MiguelsPizza) and [Jason McGhee](https://github.com/jasonjmcghee/) for sharing related [implementation](https://github.com/MiguelsPizza/WebMCP) [experience](https://github.com/jasonjmcghee/WebMCP).

declarative-api-explainer.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# WebMCP declarative API
2+
3+
See discussion in https://github.com/webmachinelearning/webmcp/issues/22 that led to the creation of
4+
this proposal.
5+
6+
## Motivation
7+
8+
WebMCP lets developers expose intricate functionality backed by a website's JavaScript functions to
9+
an agent as "tools", effectively turning the site into an "MCP server". Agents can see the list of
10+
tools a site offers paired with natural language descriptions of what the tools do, and invoke them
11+
with structured data.
12+
13+
With WebMCP, agents can perform complex actions like booking a flight or reserving a table by
14+
hooking into a site's own code designed to perform those actions, instead of the agent having to
15+
figure it out manually through a brittle series of screen shots, scrolls, and out-of-date screen
16+
reads.
17+
18+
However, not all site functionality is exposed via JavaScript functions, and features that *are*
19+
take some effort to rewrite with an agent invoker in mind. Much of a site's functionality is
20+
provided via semantic HTML elements like `<form>`, and its various inputs. To **make it easier** for
21+
developers to expose this kind of site functionality while still using thte semantic web, we
22+
propose:
23+
24+
1. New attributes that augment `<form>`s and [form-associated
25+
elements](https://html.spec.whatwg.org/#form-associated-element), that expose these as WebMCP
26+
tools to agents.
27+
2. Algorithms that deterministically "compile" a form and its associated inputs down to a WebMCP
28+
"input schema", so that the agent knows how to fill out the form and submit it.
29+
3. Two ways of getting a form response back to the agent that invoked the form tool:
30+
1. `SubmitEvent#respondWith()`, which lets JavaScript on the page override the default form
31+
action, and pipe a response back to the agent without navigating the page.
32+
2. Extracting `<script type="application/json-ld">` tags on the page that the form navigated to,
33+
and using that structured data as a response to the form.
34+
35+
## Form attributes
36+
37+
```html
38+
<form
39+
toolname="Search flights"
40+
tooldescription="This form searches flights and displays [...]"
41+
toolautosubmit>
42+
```
43+
44+
The `toolname` attribute is analogous to the imperative API's
45+
[`ModelContextTool#name`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-name),
46+
while `tooldescription` is analogous to
47+
[`ModelContextTool#description`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-description).
48+
49+
The `toolautosubmit` [boolean attribute](https://html.spec.whatwg.org/C#boolean-attribute), lets the
50+
agent submit the form on the user's behalf after filling it out, without requiring the user to check
51+
it manually before submitting. If this attribute is missing when the agent finishes filling out the
52+
form, the browser brings the submit attribute into focus, and the agent should then tell the user to
53+
check the form contents, and submit it manually.
54+
55+
When forms with these attributes are inserted, removed, or these attributes are updated, the form
56+
creates a new declarative WebMCP tool whose input schema is generated according to
57+
[Input schema synthesis](#input-schema-synthesis).
58+
59+
TODO(domfarolino): Describe the `toolparamname` and `toolparamdescription` attributes, and how they
60+
are processed on form-associated elements.
61+
62+
## Processing model
63+
64+
### Changes to form reset
65+
66+
When a form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) **OR** its tool
67+
declaration changes (as a result of `toolname` attribute modifications, for example), then any
68+
in-flight invocation of the tool will be cancelled, and the agent will be notified of this
69+
cancellation.
70+
71+
### Input schema synthesis
72+
73+
TODO: The exact algorithms reducing a form, its form-associated elements, and *their* attributes
74+
like [`step`](https://html.spec.whatwg.org/C#the-step-attribute) and
75+
[`min`](https://html.spec.whatwg.org/C#attr-input-min) is TBD. We need to concretely specify how
76+
various form-associated elements like `<input>` and `<select>` reduce to a JSON Schema that includes
77+
`anyOf`, `oneOf`, and `maximum`/`mininum` declarations.
78+
79+
Chromium is implementing a loose version of this and will conduct testing/trials to see if what
80+
we've come up with should be supported by the community as a general approach.
81+
82+
### Getting the form response to the agent
83+
84+
TODO: Mention application/json-ld responses, and so on.
85+
86+
### Events
87+
88+
**Additions to `SubmitEvent`**
89+
90+
The `SubmitEvent` interface gets two new members, `agentInvoked` to let `submit` event handler react
91+
to agent-invoked form submissions, and the `respondWith()` method.
92+
93+
This method takes a `Promise<any>` that resolves to the response that the agent will consume. This
94+
method is used to override the default behavior of the form submission; the form's `action` will NOT
95+
navigate, and the `preventDefault()` must be called before this method is called.
96+
97+
```js
98+
[Exposed=Window]
99+
interface SubmitEvent : Event {
100+
// ...
101+
readonly attribute boolean agentInvoked;
102+
undefined respondWith(Promise<any> agentResponse);
103+
};
104+
```
105+
106+
**`toolactivated` and `toolcanceled` events
107+
108+
TODO: Fill this out.
109+
110+
## Integration with other imperative API bits
111+
112+
It's an open question as to whether [an
113+
`outputSchema`](https://github.com/webmachinelearning/webmcp/issues/9) makes sense for declarative
114+
WebMCP tools, and therefore if the `agentResponse` Promise passed to `SubmitEvent#respondWith()`
115+
must resolve to an object conforming to such schema.
116+
117+
It is TBD how *declarative* WebMCP tools will be exposed to any interface that exposes a site's
118+
tools to JavaScript. See https://github.com/webmachinelearning/webmcp/issues/51 for context. Should
119+
a declarative WebMCP tool be able to be invoked from such an interface, should it exist in the
120+
future?

docs/explainer.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

0 commit comments

Comments
 (0)