|
| 1 | +# WebMCP declarative API |
| 2 | + |
| 3 | +See discussion in https://github.com/webmachinelearning/webmcp/issues/22 that led to the creation of |
| 4 | +this proposal. |
| 5 | + |
| 6 | +## Motivation |
| 7 | + |
| 8 | +WebMCP lets developers expose intricate functionality backed by a website's JavaScript functions to |
| 9 | +an agent as "tools", effectively turning the site into an "MCP server". Agents can see the list of |
| 10 | +tools a site offers paired with natural language descriptions of what the tools do, and invoke them |
| 11 | +with structured data. |
| 12 | + |
| 13 | +With WebMCP, agents can perform complex actions like booking a flight or reserving a table by |
| 14 | +hooking into a site's own code designed to perform those actions, instead of the agent having to |
| 15 | +figure it out manually through a brittle series of screen shots, scrolls, and out-of-date screen |
| 16 | +reads. |
| 17 | + |
| 18 | +However, not all site functionality is exposed via JavaScript functions, and features that *are* |
| 19 | +take some effort to rewrite with an agent invoker in mind. Much of a site's functionality is |
| 20 | +provided via semantic HTML elements like `<form>`, and its various inputs. To **make it easier** for |
| 21 | +developers to expose this kind of site functionality while still using thte semantic web, we |
| 22 | +propose: |
| 23 | + |
| 24 | +1. New attributes that augment `<form>`s and [form-associated |
| 25 | + elements](https://html.spec.whatwg.org/#form-associated-element), that expose these as WebMCP |
| 26 | + tools to agents. |
| 27 | +2. Algorithms that deterministically "compile" a form and its associated inputs down to a WebMCP |
| 28 | + "input schema", so that the agent knows how to fill out the form and submit it. |
| 29 | +3. Two ways of getting a form response back to the agent that invoked the form tool: |
| 30 | + 1. `SubmitEvent#respondWith()`, which lets JavaScript on the page override the default form |
| 31 | + action, and pipe a response back to the agent without navigating the page. |
| 32 | + 2. Extracting `<script type="application/json-ld">` tags on the page that the form navigated to, |
| 33 | + and using that structured data as a response to the form. |
| 34 | + |
| 35 | +## Form attributes |
| 36 | + |
| 37 | +```html |
| 38 | +<form |
| 39 | + toolname="Search flights" |
| 40 | + tooldescription="This form searches flights and displays [...]" |
| 41 | + toolautosubmit> |
| 42 | +``` |
| 43 | + |
| 44 | +The `toolname` attribute is analogous to the imperative API's |
| 45 | +[`ModelContextTool#name`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-name), |
| 46 | +while `tooldescription` is analogous to |
| 47 | +[`ModelContextTool#description`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-description). |
| 48 | + |
| 49 | +The `toolautosubmit` [boolean attribute](https://html.spec.whatwg.org/C#boolean-attribute), lets the |
| 50 | +agent submit the form on the user's behalf after filling it out, without requiring the user to check |
| 51 | +it manually before submitting. If this attribute is missing when the agent finishes filling out the |
| 52 | +form, the browser brings the submit attribute into focus, and the agent should then tell the user to |
| 53 | +check the form contents, and submit it manually. |
| 54 | + |
| 55 | +When forms with these attributes are inserted, removed, or these attributes are updated, the form |
| 56 | +creates a new declarative WebMCP tool whose input schema is generated according to |
| 57 | +[Input schema synthesis](#input-schema-synthesis). |
| 58 | + |
| 59 | +TODO(domfarolino): Describe the `toolparamname` and `toolparamdescription` attributes, and how they |
| 60 | +are processed on form-associated elements. |
| 61 | + |
| 62 | +## Processing model |
| 63 | + |
| 64 | +### Changes to form reset |
| 65 | + |
| 66 | +When a form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) **OR** its tool |
| 67 | +declaration changes (as a result of `toolname` attribute modifications, for example), then any |
| 68 | +in-flight invocation of the tool will be cancelled, and the agent will be notified of this |
| 69 | +cancellation. |
| 70 | + |
| 71 | +### Input schema synthesis |
| 72 | + |
| 73 | +TODO: The exact algorithms reducing a form, its form-associated elements, and *their* attributes |
| 74 | +like [`step`](https://html.spec.whatwg.org/C#the-step-attribute) and |
| 75 | +[`min`](https://html.spec.whatwg.org/C#attr-input-min) is TBD. We need to concretely specify how |
| 76 | +various form-associated elements like `<input>` and `<select>` reduce to a JSON Schema that includes |
| 77 | +`anyOf`, `oneOf`, and `maximum`/`mininum` declarations. |
| 78 | + |
| 79 | +Chromium is implementing a loose version of this and will conduct testing/trials to see if what |
| 80 | +we've come up with should be supported by the community as a general approach. |
| 81 | + |
| 82 | +### Getting the form response to the agent |
| 83 | + |
| 84 | +TODO: Mention application/json-ld responses, and so on. |
| 85 | + |
| 86 | +### Events |
| 87 | + |
| 88 | +**Additions to `SubmitEvent`** |
| 89 | + |
| 90 | +The `SubmitEvent` interface gets two new members, `agentInvoked` to let `submit` event handler react |
| 91 | +to agent-invoked form submissions, and the `respondWith()` method. |
| 92 | + |
| 93 | +This method takes a `Promise<any>` that resolves to the response that the agent will consume. This |
| 94 | +method is used to override the default behavior of the form submission; the form's `action` will NOT |
| 95 | +navigate, and the `preventDefault()` must be called before this method is called. |
| 96 | + |
| 97 | +```js |
| 98 | +[Exposed=Window] |
| 99 | +interface SubmitEvent : Event { |
| 100 | + // ... |
| 101 | + readonly attribute boolean agentInvoked; |
| 102 | + undefined respondWith(Promise<any> agentResponse); |
| 103 | +}; |
| 104 | +``` |
| 105 | + |
| 106 | +**`toolactivated` and `toolcanceled` events |
| 107 | + |
| 108 | +TODO: Fill this out. |
| 109 | + |
| 110 | +## Integration with other imperative API bits |
| 111 | + |
| 112 | +It's an open question as to whether [an |
| 113 | +`outputSchema`](https://github.com/webmachinelearning/webmcp/issues/9) makes sense for declarative |
| 114 | +WebMCP tools, and therefore if the `agentResponse` Promise passed to `SubmitEvent#respondWith()` |
| 115 | +must resolve to an object conforming to such schema. |
| 116 | + |
| 117 | +It is TBD how *declarative* WebMCP tools will be exposed to any interface that exposes a site's |
| 118 | +tools to JavaScript. See https://github.com/webmachinelearning/webmcp/issues/51 for context. Should |
| 119 | +a declarative WebMCP tool be able to be invoked from such an interface, should it exist in the |
| 120 | +future? |
0 commit comments