|
| 1 | +# WebMCP declarative API |
| 2 | + |
| 3 | +See discussion in https://github.com/webmachinelearning/webmcp/issues/22 that led to the creation of |
| 4 | +this proposal. |
| 5 | + |
| 6 | +## Motivation |
| 7 | + |
| 8 | +WebMCP lets developers expose intricate functionality backed by a website's JavaScript functions to |
| 9 | +an agent as "tools", effectively turning the site into an "MCP server". Agents can see the list of |
| 10 | +tools a site offers paired with natural language descriptions of what the tools do, and invoke them |
| 11 | +with structured data. |
| 12 | + |
| 13 | +With WebMCP, agents can perform complex actions like booking a flight or reserving a table by |
| 14 | +hooking into a site's own code designed to perform those actions, instead of the agent having to |
| 15 | +figure it out manually through a brittle series of screen shots, scrolls, and out-of-date screen |
| 16 | +reads. |
| 17 | + |
| 18 | +However, not all site functionality is exposed via JavaScript functions, and features that *are* |
| 19 | +take some effort to rewrite with an agent invoker in mind. Much of a site's functionality is |
| 20 | +provided via semantic HTML elements like `<form>`, and its various inputs. To **make it easier** for |
| 21 | +developers to expose this kind of site functionality while still using the semantic web, we |
| 22 | +propose: |
| 23 | + |
| 24 | +1. New attributes that augment `<form>`s and [form-associated |
| 25 | + elements](https://html.spec.whatwg.org/#form-associated-element), that expose these as WebMCP |
| 26 | + tools to agents. |
| 27 | +2. Algorithms that deterministically "compile" a form and its associated inputs down to a WebMCP |
| 28 | + "input schema", so that the agent knows how to fill out the form and submit it. |
| 29 | +3. Two ways of getting a form response back to the agent that invoked the form tool: |
| 30 | + 1. `SubmitEvent#respondWith()`, which lets JavaScript on the page override the default form |
| 31 | + action, and pipe a response back to the agent without navigating the page. |
| 32 | + 2. Extracting `<script type="application/json-ld">` tags on the page that the form navigated to, |
| 33 | + and using that structured data as a response to the form. |
| 34 | + |
| 35 | +## Form attributes |
| 36 | + |
| 37 | +```html |
| 38 | +<form |
| 39 | + toolname="Search flights" |
| 40 | + tooldescription="This form searches flights and displays [...]" |
| 41 | + toolautosubmit> |
| 42 | +``` |
| 43 | + |
| 44 | +The `toolname` attribute is analogous to the imperative API's |
| 45 | +[`ModelContextTool#name`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-name), |
| 46 | +while `tooldescription` is analogous to |
| 47 | +[`ModelContextTool#description`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-description). |
| 48 | + |
| 49 | +The `toolautosubmit` [boolean attribute](https://html.spec.whatwg.org/C#boolean-attribute), lets the |
| 50 | +agent submit the form on the user's behalf after filling it out, without requiring the user to check |
| 51 | +it manually before submitting. If this attribute is missing when the agent finishes filling out the |
| 52 | +form, the browser brings the submit button into focus, and the agent should then tell the user to |
| 53 | +check the form contents, and submit it manually. |
| 54 | + |
| 55 | +When forms with these attributes are inserted, removed, or these attributes are updated, the form |
| 56 | +creates a new declarative WebMCP tool whose input schema is generated according to |
| 57 | +[Input schema synthesis](#input-schema-synthesis). |
| 58 | + |
| 59 | +### Name and description |
| 60 | + |
| 61 | +The [`name`](https://html.spec.whatwg.org/C#attr-fe-name) attribute on form control elements |
| 62 | +supplies the name of each "property" in the input schema generated for a declarative tool. |
| 63 | + |
| 64 | +Since there's no pre-existing description attribute we can use, we introduce the |
| 65 | +`toolparamdescription` attribute for form control elements, which contributes the |
| 66 | +[description](https://json-schema.org/draft/2020-12/json-schema-validation#name-title-and-description) |
| 67 | +of each "property" in the input schema generated for a declarative tool. |
| 68 | + |
| 69 | +With this, the following imperative structure: |
| 70 | + |
| 71 | +```js |
| 72 | +window.navigator.modelContext.registerTool({ |
| 73 | + name: "search-cars", |
| 74 | + description: "Perform a car make/model search", |
| 75 | + inputSchema: { |
| 76 | + type: "object", |
| 77 | + properties: { |
| 78 | + make: { type: "string", description: "The vehicle's make (e.g., BMW, Ford)" }, |
| 79 | + model: { type: "string", description: "The vehicle's model (e.g., 330i, F-150)" }, |
| 80 | + }, |
| 81 | + required: ["make", "model"] |
| 82 | + }, |
| 83 | + execute({make, model}, agent) { ... } |
| 84 | +}); |
| 85 | +``` |
| 86 | + |
| 87 | +... is equivalent to the following declarative form: |
| 88 | + |
| 89 | +```html |
| 90 | +<form toolname="search-cars" tooldescription="Perform a car make/model search" [...]> |
| 91 | + <input type=text name="make" toolparamdescription="The vehicle's make (i.e., BMW, Ford)" required> |
| 92 | + <input type=text name="model" toolparamdescription="The vehicle's model (i.e., 330i, F-150)" required> |
| 93 | + <button type=submit>Search</button> |
| 94 | +</form> |
| 95 | +``` |
| 96 | + |
| 97 | +## Processing model |
| 98 | + |
| 99 | +### Changes to form reset |
| 100 | + |
| 101 | +When a form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) **OR** its tool |
| 102 | +declaration changes (as a result of `toolname` attribute modifications, for example), then any |
| 103 | +in-flight invocation of the tool will be cancelled, and the agent will be notified of this |
| 104 | +cancellation. |
| 105 | + |
| 106 | +### Input schema synthesis |
| 107 | + |
| 108 | +TODO: The exact algorithms reducing a form, its form-associated elements, and *their* attributes |
| 109 | +like [`step`](https://html.spec.whatwg.org/C#the-step-attribute) and |
| 110 | +[`min`](https://html.spec.whatwg.org/C#attr-input-min) is TBD. We need to concretely specify how |
| 111 | +various form-associated elements like `<input>` and `<select>` reduce to a JSON Schema that includes |
| 112 | +`anyOf`, `oneOf`, and `maximum`/`mininum` declarations. |
| 113 | + |
| 114 | +Chromium is implementing a loose version of this and will conduct testing/trials to see if what |
| 115 | +we've come up with should be supported by the community as a general approach. |
| 116 | + |
| 117 | +### Getting the form response to the agent |
| 118 | + |
| 119 | +This topic is currently under debate; see https://github.com/webmachinelearning/webmcp/issues/135. |
| 120 | + |
| 121 | +<details> |
| 122 | +<summary>Click to read the `application/ld+json` proposal before the above issue was filed</summary> |
| 123 | + |
| 124 | +When a form element performs a navigation, the first `<script type=application/ld+json>` tag on the |
| 125 | +target page is used as the cross-document tool's "response" that gets sent to the model. |
| 126 | + |
| 127 | +When no such a tag is present, probably we'll decide that the page's entire contents is sent to the |
| 128 | +model as the response, since that's an accurate semantic representation of the result of the tool. |
| 129 | +However, this is technically TBD at the moment. |
| 130 | + |
| 131 | +When the form element does *NOT* perform a navigation, JavaScript can hand-craft the response to the |
| 132 | +agent via the `SubmitEvent#respondWith()` method described below. |
| 133 | +</details> |
| 134 | + |
| 135 | +### Pseudo-classes |
| 136 | + |
| 137 | +Authors might want a way to bring to the user's attention or otherwise highlight a declarative |
| 138 | +WebMCP form that was filled out by the agent, and is waiting for the user to check the form and |
| 139 | +submit it. (This is essentially only relevant for forms without the `toolautosubmit` attribute). To |
| 140 | +support this, we introduce the CSS pseudo-classes `:tool-form-active` and `:tool-submit-active`. |
| 141 | + |
| 142 | +The `:tool-form-active` pseudo-class matches `<form>` elements whose declarative tool is "running". |
| 143 | +The exact definition of this will be clarified in the specification, but in short, a declarative |
| 144 | +tool is considered "running" starting when the form is being filled out with agent output, until one |
| 145 | +of the following: |
| 146 | + |
| 147 | + - The form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) or removed from the DOM |
| 148 | + - The Promise returned from `SubmitEvent#respondWith()` resolves with a tool output |
| 149 | + - The form's `toolname` or `tooldescription` attributes are modified, added, or removed |
| 150 | + - The form is automatically submitted with the agent output, due to the `toolautosubmit` attribute |
| 151 | + |
| 152 | +The `:tool-submit-active` pseudo-class matches the submit button of a `:tool-form-active` form |
| 153 | +element. |
| 154 | + |
| 155 | +### Events |
| 156 | + |
| 157 | +**Additions to `SubmitEvent`** |
| 158 | + |
| 159 | +The `SubmitEvent` interface gets two new members, `agentInvoked` to let `submit` event handler react |
| 160 | +to agent-invoked form submissions, and the `respondWith()` method. |
| 161 | + |
| 162 | +This method takes a `Promise<any>` that resolves to the response that the agent will consume. This |
| 163 | +method is used to override the default behavior of the form submission; the form's `action` will NOT |
| 164 | +navigate, and the `preventDefault()` must be called before this method is called. |
| 165 | + |
| 166 | +```js |
| 167 | +[Exposed=Window] |
| 168 | +interface SubmitEvent : Event { |
| 169 | + // ... |
| 170 | + readonly attribute boolean agentInvoked; |
| 171 | + undefined respondWith(Promise<any> agentResponse); |
| 172 | +}; |
| 173 | +``` |
| 174 | + |
| 175 | +**`toolactivated` and `toolcanceled` events |
| 176 | + |
| 177 | +We introduce these events that get fired at the `ModelContext` object when a WebMCP tool is run, and when |
| 178 | +its invocation is canceled. |
| 179 | + |
| 180 | +The `toolactivated` event gives the developer a hook to perform any actions, such as bringing the |
| 181 | +form to the user's attention, once a declarative tool is filled out but before it is submitted. |
| 182 | +(This presumes the absence of the `toolautosubmit` attribute). This event can be seen as the |
| 183 | +JavaScript equivalent of the [`:tool-form-active` pseudo-class](#pseudo-classes). |
| 184 | + |
| 185 | +When the agent cancels a tool call (perhaps because a user has instigated another turn of the |
| 186 | +conversation, obviating the need for the pending tool), the `toolcanceled` event is fired. Note that |
| 187 | +this event does not fire when the site itself cancels the tool, due to removing the form element or |
| 188 | +changing its name or description. |
| 189 | + |
| 190 | +Some open questions: |
| 191 | + |
| 192 | +> [!WARNING] |
| 193 | +> Should these events fire for imperative tool call invocations as well? Chromium |
| 194 | +> [seems to do |
| 195 | +> that](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/script_tools/model_context.cc;l=265-274;drc=2af6413cf36d701fdaffb09188f2ab2a5be37f4f). |
| 196 | +
|
| 197 | +> [!WARNING] |
| 198 | +> For declarative, should they be fired at `Window` or at the `<form>` that registered the tool in |
| 199 | +> the first place, and bubble up to the document that way? See |
| 200 | +> https://github.com/webmachinelearning/webmcp/issues/126. |
| 201 | +
|
| 202 | +## Integration with other imperative API bits |
| 203 | + |
| 204 | +It's an open question as to whether [an |
| 205 | +`outputSchema`](https://github.com/webmachinelearning/webmcp/issues/9) makes sense for declarative |
| 206 | +WebMCP tools, and therefore if the `agentResponse` Promise passed to `SubmitEvent#respondWith()` |
| 207 | +must resolve to an object conforming to such schema. |
| 208 | + |
| 209 | +It is TBD how *declarative* WebMCP tools will be exposed to any interface that exposes a site's |
| 210 | +tools to JavaScript. See https://github.com/webmachinelearning/webmcp/issues/51 for context. Should |
| 211 | +a declarative WebMCP tool be able to be invoked from such an interface, should it exist in the |
| 212 | +future? Almost certainly, yes. But details are TBD. |
0 commit comments