Skip to content

Commit ebfccbe

Browse files
authored
Declarative API Explainer (#76)
1 parent fe84c67 commit ebfccbe

File tree

1 file changed

+212
-0
lines changed

1 file changed

+212
-0
lines changed

declarative-api-explainer.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# WebMCP declarative API
2+
3+
See discussion in https://github.com/webmachinelearning/webmcp/issues/22 that led to the creation of
4+
this proposal.
5+
6+
## Motivation
7+
8+
WebMCP lets developers expose intricate functionality backed by a website's JavaScript functions to
9+
an agent as "tools", effectively turning the site into an "MCP server". Agents can see the list of
10+
tools a site offers paired with natural language descriptions of what the tools do, and invoke them
11+
with structured data.
12+
13+
With WebMCP, agents can perform complex actions like booking a flight or reserving a table by
14+
hooking into a site's own code designed to perform those actions, instead of the agent having to
15+
figure it out manually through a brittle series of screen shots, scrolls, and out-of-date screen
16+
reads.
17+
18+
However, not all site functionality is exposed via JavaScript functions, and features that *are*
19+
take some effort to rewrite with an agent invoker in mind. Much of a site's functionality is
20+
provided via semantic HTML elements like `<form>`, and its various inputs. To **make it easier** for
21+
developers to expose this kind of site functionality while still using the semantic web, we
22+
propose:
23+
24+
1. New attributes that augment `<form>`s and [form-associated
25+
elements](https://html.spec.whatwg.org/#form-associated-element), that expose these as WebMCP
26+
tools to agents.
27+
2. Algorithms that deterministically "compile" a form and its associated inputs down to a WebMCP
28+
"input schema", so that the agent knows how to fill out the form and submit it.
29+
3. Two ways of getting a form response back to the agent that invoked the form tool:
30+
1. `SubmitEvent#respondWith()`, which lets JavaScript on the page override the default form
31+
action, and pipe a response back to the agent without navigating the page.
32+
2. Extracting `<script type="application/json-ld">` tags on the page that the form navigated to,
33+
and using that structured data as a response to the form.
34+
35+
## Form attributes
36+
37+
```html
38+
<form
39+
toolname="Search flights"
40+
tooldescription="This form searches flights and displays [...]"
41+
toolautosubmit>
42+
```
43+
44+
The `toolname` attribute is analogous to the imperative API's
45+
[`ModelContextTool#name`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-name),
46+
while `tooldescription` is analogous to
47+
[`ModelContextTool#description`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-description).
48+
49+
The `toolautosubmit` [boolean attribute](https://html.spec.whatwg.org/C#boolean-attribute), lets the
50+
agent submit the form on the user's behalf after filling it out, without requiring the user to check
51+
it manually before submitting. If this attribute is missing when the agent finishes filling out the
52+
form, the browser brings the submit button into focus, and the agent should then tell the user to
53+
check the form contents, and submit it manually.
54+
55+
When forms with these attributes are inserted, removed, or these attributes are updated, the form
56+
creates a new declarative WebMCP tool whose input schema is generated according to
57+
[Input schema synthesis](#input-schema-synthesis).
58+
59+
### Name and description
60+
61+
The [`name`](https://html.spec.whatwg.org/C#attr-fe-name) attribute on form control elements
62+
supplies the name of each "property" in the input schema generated for a declarative tool.
63+
64+
Since there's no pre-existing description attribute we can use, we introduce the
65+
`toolparamdescription` attribute for form control elements, which contributes the
66+
[description](https://json-schema.org/draft/2020-12/json-schema-validation#name-title-and-description)
67+
of each "property" in the input schema generated for a declarative tool.
68+
69+
With this, the following imperative structure:
70+
71+
```js
72+
window.navigator.modelContext.registerTool({
73+
name: "search-cars",
74+
description: "Perform a car make/model search",
75+
inputSchema: {
76+
type: "object",
77+
properties: {
78+
make: { type: "string", description: "The vehicle's make (e.g., BMW, Ford)" },
79+
model: { type: "string", description: "The vehicle's model (e.g., 330i, F-150)" },
80+
},
81+
required: ["make", "model"]
82+
},
83+
execute({make, model}, agent) { ... }
84+
});
85+
```
86+
87+
... is equivalent to the following declarative form:
88+
89+
```html
90+
<form toolname="search-cars" tooldescription="Perform a car make/model search" [...]>
91+
<input type=text name="make" toolparamdescription="The vehicle's make (i.e., BMW, Ford)" required>
92+
<input type=text name="model" toolparamdescription="The vehicle's model (i.e., 330i, F-150)" required>
93+
<button type=submit>Search</button>
94+
</form>
95+
```
96+
97+
## Processing model
98+
99+
### Changes to form reset
100+
101+
When a form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) **OR** its tool
102+
declaration changes (as a result of `toolname` attribute modifications, for example), then any
103+
in-flight invocation of the tool will be cancelled, and the agent will be notified of this
104+
cancellation.
105+
106+
### Input schema synthesis
107+
108+
TODO: The exact algorithms reducing a form, its form-associated elements, and *their* attributes
109+
like [`step`](https://html.spec.whatwg.org/C#the-step-attribute) and
110+
[`min`](https://html.spec.whatwg.org/C#attr-input-min) is TBD. We need to concretely specify how
111+
various form-associated elements like `<input>` and `<select>` reduce to a JSON Schema that includes
112+
`anyOf`, `oneOf`, and `maximum`/`mininum` declarations.
113+
114+
Chromium is implementing a loose version of this and will conduct testing/trials to see if what
115+
we've come up with should be supported by the community as a general approach.
116+
117+
### Getting the form response to the agent
118+
119+
This topic is currently under debate; see https://github.com/webmachinelearning/webmcp/issues/135.
120+
121+
<details>
122+
<summary>Click to read the `application/ld+json` proposal before the above issue was filed</summary>
123+
124+
When a form element performs a navigation, the first `<script type=application/ld+json>` tag on the
125+
target page is used as the cross-document tool's "response" that gets sent to the model.
126+
127+
When no such a tag is present, probably we'll decide that the page's entire contents is sent to the
128+
model as the response, since that's an accurate semantic representation of the result of the tool.
129+
However, this is technically TBD at the moment.
130+
131+
When the form element does *NOT* perform a navigation, JavaScript can hand-craft the response to the
132+
agent via the `SubmitEvent#respondWith()` method described below.
133+
</details>
134+
135+
### Pseudo-classes
136+
137+
Authors might want a way to bring to the user's attention or otherwise highlight a declarative
138+
WebMCP form that was filled out by the agent, and is waiting for the user to check the form and
139+
submit it. (This is essentially only relevant for forms without the `toolautosubmit` attribute). To
140+
support this, we introduce the CSS pseudo-classes `:tool-form-active` and `:tool-submit-active`.
141+
142+
The `:tool-form-active` pseudo-class matches `<form>` elements whose declarative tool is "running".
143+
The exact definition of this will be clarified in the specification, but in short, a declarative
144+
tool is considered "running" starting when the form is being filled out with agent output, until one
145+
of the following:
146+
147+
- The form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) or removed from the DOM
148+
- The Promise returned from `SubmitEvent#respondWith()` resolves with a tool output
149+
- The form's `toolname` or `tooldescription` attributes are modified, added, or removed
150+
- The form is automatically submitted with the agent output, due to the `toolautosubmit` attribute
151+
152+
The `:tool-submit-active` pseudo-class matches the submit button of a `:tool-form-active` form
153+
element.
154+
155+
### Events
156+
157+
**Additions to `SubmitEvent`**
158+
159+
The `SubmitEvent` interface gets two new members, `agentInvoked` to let `submit` event handler react
160+
to agent-invoked form submissions, and the `respondWith()` method.
161+
162+
This method takes a `Promise<any>` that resolves to the response that the agent will consume. This
163+
method is used to override the default behavior of the form submission; the form's `action` will NOT
164+
navigate, and the `preventDefault()` must be called before this method is called.
165+
166+
```js
167+
[Exposed=Window]
168+
interface SubmitEvent : Event {
169+
// ...
170+
readonly attribute boolean agentInvoked;
171+
undefined respondWith(Promise<any> agentResponse);
172+
};
173+
```
174+
175+
**`toolactivated` and `toolcanceled` events
176+
177+
We introduce these events that get fired at the `ModelContext` object when a WebMCP tool is run, and when
178+
its invocation is canceled.
179+
180+
The `toolactivated` event gives the developer a hook to perform any actions, such as bringing the
181+
form to the user's attention, once a declarative tool is filled out but before it is submitted.
182+
(This presumes the absence of the `toolautosubmit` attribute). This event can be seen as the
183+
JavaScript equivalent of the [`:tool-form-active` pseudo-class](#pseudo-classes).
184+
185+
When the agent cancels a tool call (perhaps because a user has instigated another turn of the
186+
conversation, obviating the need for the pending tool), the `toolcanceled` event is fired. Note that
187+
this event does not fire when the site itself cancels the tool, due to removing the form element or
188+
changing its name or description.
189+
190+
Some open questions:
191+
192+
> [!WARNING]
193+
> Should these events fire for imperative tool call invocations as well? Chromium
194+
> [seems to do
195+
> that](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/script_tools/model_context.cc;l=265-274;drc=2af6413cf36d701fdaffb09188f2ab2a5be37f4f).
196+
197+
> [!WARNING]
198+
> For declarative, should they be fired at `Window` or at the `<form>` that registered the tool in
199+
> the first place, and bubble up to the document that way? See
200+
> https://github.com/webmachinelearning/webmcp/issues/126.
201+
202+
## Integration with other imperative API bits
203+
204+
It's an open question as to whether [an
205+
`outputSchema`](https://github.com/webmachinelearning/webmcp/issues/9) makes sense for declarative
206+
WebMCP tools, and therefore if the `agentResponse` Promise passed to `SubmitEvent#respondWith()`
207+
must resolve to an object conforming to such schema.
208+
209+
It is TBD how *declarative* WebMCP tools will be exposed to any interface that exposes a site's
210+
tools to JavaScript. See https://github.com/webmachinelearning/webmcp/issues/51 for context. Should
211+
a declarative WebMCP tool be able to be invoked from such an interface, should it exist in the
212+
future? Almost certainly, yes. But details are TBD.

0 commit comments

Comments
 (0)