webmcp/index.bs at main · webmachinelearning/webmcp · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
<pre class='metadata'>
Title: WebMCP
Shortname: webmcp
Level: None
Status: CG-DRAFT
Group: webml
Repository: webmachinelearning/webmcp
URL: https://webmachinelearning.github.io/webmcp
Editor: Brandon Walderman, Microsoft https://www.microsoft.com, brwalder@microsoft.com, w3cid 115877
Editor: Khushal Sagar, Google https://www.google.com, khushalsagar@google.com, w3cid 122787
Editor: Dominic Farolino, Google https://www.google.com, domfarolino@google.com, w3cid 102763
Abstract: The WebMCP API enables web applications to provide JavaScript-based tools to AI agents.
Markup Shorthands: markdown yes, css no
Complain About: accidental-2119 yes, missing-example-ids yes
Assume Explicit For: yes
Default Biblio Status: current
Boilerplate: omit conformance
Indent: 2
Die On: warning
</pre>

<style>
/* domintro and XXX from https://resources.whatwg.org/standard.css */
.domintro {
  position: relative;
  color: green;
  background: #DDFFDD;
  margin: 2.5em 0 2em 0;
  padding: 1.5em 1em 0.5em 2em;
}

.domintro dt, .domintro dt * {
  color: black;
  font-size: inherit;
}
.domintro dd {
  margin: 0.5em 0 1em 2em; padding: 0;
}
.domintro dd p {
  margin: 0.5em 0;
}
.domintro::before {
  content: 'For web developers (non-normative)';
  background: green;
  color: white;
  padding: 0.15em 0.25em;
  font-style: normal;
  position: absolute;
  top: -0.8em;
  left: -0.8em;
}

.XXX {
  color: #D50606;
  /* The value #111 is what WHATWG uses in dark mode: `--xxx-bg: #111;`. */
  background: light-dark(white, #111);
  border: solid #D50606;
}

/* dl.props from https://resources.whatwg.org/standard.css */
dl.props { display: grid; grid-template-columns: max-content auto; row-gap: 0.25em; column-gap: 1em; }
dl.props > dt { grid-column-start: 1; margin: 0; }
dl.props > dd { grid-column-start: 2; margin: 0; }
p + dl.props { margin-top: -0.5em; }

/* Put nice boxes around each algorithm. */
[data-algorithm]:not(.heading) {
  padding: .5em;
  border: thin solid #ddd; border-radius: .5em;
  margin: .5em calc(-0.5em - 1px);
}
[data-algorithm]:not(.heading) > :first-child {
  margin-top: 0;
}
[data-algorithm]:not(.heading) > :last-child {
  margin-bottom: 0;
}
[data-algorithm] [data-algorithm] {
  margin: 1em 0;
}
</style>

<pre class="link-defaults">
spec:html; type:dfn;
  text:form-associated element
</pre>

<h2 id="intro">Introduction</h2>

WebMCP API is a new JavaScript interface that allows web developers to expose their web application functionality as “tools” - JavaScript functions with natural language descriptions and structured schemas that can be invoked by [=agents=], [=browser's agents=], and [=assistive technologies=]. Web pages that use WebMCP can be thought of as Model Context Protocol [[!MCP]] servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.

<h2 id="terminology">Terminology</h2>

<!--
TODO: Define any reusable high-level terms here that are not specific to algorithms and not defined elsewhere (https://respec.org/xref/), consider exporting (https://speced.github.io/bikeshed/#dfn-export) if useful for other specs
-->

An <dfn>agent</dfn> is an autonomous assistant that can understand a user’s goals and take actions on the user’s behalf to achieve them. Today, these are typically implemented by large language model (LLM) based [=AI platforms=], interacting with users via text-based chat interfaces.

A <dfn>browser’s agent</dfn> is an [=agent=] provided by or through the browser that could be built directly into the browser or hosted by it, for example, via an extension or plug-in.

An <dfn>AI platform</dfn> is a provider of agentic assistants such as OpenAI’s ChatGPT, Anthropic’s Claude, or Google’s Gemini.

<h2 id="supporting-concepts">Supporting concepts</h2>

A <dfn>model context</dfn> is a [=struct=] with the following [=struct/items=]:

<dl dfn-for="model context">
  : <dfn>tool map</dfn>
  :: a [=map=] whose [=map/keys=] are [=strings=] and whose [=map/values=] are [=tool definition=]
     [=structs=].
</dl>

A <dfn>tool definition</dfn> is a [=struct=] with the following [=struct/items=]:

<dl dfn-for="tool definition">
  : <dfn>name</dfn>
  :: a [=string=] uniquely identifying a tool registered within a [=model context=]'s [=model
     context/tool map=]; it is the same as the [=map/key=] identifying this object.

  : <dfn>description</dfn>
  :: a [=string=].

  : <dfn>input schema</dfn>
  :: a [=string=].

     Note: For tools registered by the imperative form of this API (i.e.,
     {{ModelContext/registerTool()}}), this is the stringified representation of
     {{ModelContextTool/inputSchema}}. For tools registered
     [declaratively](https://github.com/webmachinelearning/webmcp/pull/76), this will be a
     stringified JSON Schema object created by the
     [=synthesize a declarative JSON Schema object algorithm=].
     [[!JSON-SCHEMA]]

  : <dfn>execute steps</dfn>
  :: a set of steps to invoke the tool.

     Note: For tools registered imperatively, these steps will simply invoke the supplied
     {{ToolExecuteCallback}} callback. For tools registered
     [declaratively](https://github.com/webmachinelearning/webmcp/pull/76), this will be a set of
     "internal" steps that have not been defined yet, that describe how to fill out a <{form}> and
     its [=form-associated elements=].

  : <dfn>read-only hint</dfn>
  :: a [=boolean=], initially false.
</dl>

<h2 id="api">API</h2>

<!--
TODO: Sketch initial algorithms, define attributes and methods etc.
https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md#api
https://dlaliberte.github.io/bikeshed-intro/#a-strategy-for-incremental-development
-->

<h3 id="navigator-extension">Extensions to the {{Navigator}} Interface</h3>

The {{Navigator}} interface is extended to provide access to the {{ModelContext}}.

<xmp class="idl">
partial interface Navigator {
  [SecureContext, SameObject] readonly attribute ModelContext modelContext;
};
</xmp>

Each {{Navigator}} object has an associated <dfn for=Navigator>modelContext</dfn>, which is a
{{ModelContext}} instance created alongside the {{Navigator}}.

<div algorithm>
The <dfn attribute for=Navigator>modelContext</dfn> getter steps are to return [=this=]'s [=Navigator/modelContext=].
</div>

<h3 id="model-context-container">ModelContext Interface</h3>

The {{ModelContext}} interface provides methods for web applications to register and manage tools that can be invoked by [=agents=].

<xmp class="idl">
[Exposed=Window, SecureContext]
interface ModelContext {
  undefined registerTool(ModelContextTool tool);
  undefined unregisterTool(DOMString name);
};
</xmp>

Each {{ModelContext}} object has an associated <dfn for=ModelContext>internal context</dfn>, which
is a [=model context=] [=struct=] created alongside the {{ModelContext}}.


<dl class="domintro">
  <dt><code><var ignore>navigator</var>.{{Navigator/modelContext}}.{{ModelContext/registerTool(tool)}}</code></dt>
  <dd>
    <p>Registers a single tool without clearing the existing set of tools. The method throws an error, if a tool with the same name already exists, or if the {{ModelContextTool/inputSchema}} is invalid.
  </dd>

  <dt><code><var ignore>navigator</var>.{{Navigator/modelContext}}.{{ModelContext/unregisterTool(name)}}</code></dt>
  <dd>
    <p>Removes the tool with the specified name from the registered set.
  </dd>
</dl>


<div algorithm>
The <dfn method for=ModelContext>registerTool(<var>tool</var>)</dfn> method steps are:

1. Let |tool map| be [=this=]'s [=ModelContext/internal context=]'s [=model context/tool map=].

1. Let |tool name| be |tool|'s {{ModelContextTool/name}}.

1. If |tool map|[|tool name|] [=map/exists=], then [=exception/throw=] an {{InvalidStateError}}
   {{DOMException}}.

1. If either |tool name| or {{ModelContextTool/description}} is the empty string, then
   [=exception/throw=] an {{InvalidStateError}} {{DOMException}}.

1. Let |stringified input schema| be the empty string.

1. If |tool|'s {{ModelContextTool/inputSchema}} [=map/exists=], then set |stringified input schema|
   to the result of [=serializing a JavaScript value to a JSON string=], given |tool|'s
   {{ModelContextTool/inputSchema}}.

   <div class="note">
     <p>The serialization algorithm above throws exceptions in the following cases:</p>

     <ol>
       <li><p><i>Throws a new {{TypeError}}</i> when the backing "<code>JSON.stringify()</code>"
       yields undefined, e.g.,
       "<code>inputSchema: { toJSON() {return HTMLDivElement;}}</code>", or
       "<code>inputSchema: { toJSON() {return undefined;}}</code>".</p></li>

       <li><p><i>Re-throws exceptions</i> thrown by "<code>JSON.stringify()</code>", e.g., when
       "<code>inputSchema</code>" is an object with a circular reference, etc.</p></li>
     </ol>
   </div>

1. Let |read-only hint| be true if |tool|'s {{ModelContextTool/annotations}} [=map/exists=] and
   its {{ToolAnnotations/readOnlyHint}} is true. Otherwise, let it be false.

1. Let |tool definition| be a new [=tool definition=], with the following [=struct/items=]:

   : [=tool definition/name=]
   :: |tool name|

   : [=tool definition/description=]
   :: |tool|'s {{ModelContextTool/description}}

   : [=tool definition/input schema=]
   :: |stringified input schema|

   : [=tool definition/execute steps=]
   :: steps that invoke |tool|'s {{ModelContextTool/execute}}

   : [=tool definition/read-only hint=]
   :: |read-only hint|

1. Set [=this=]'s [=ModelContext/internal context=][|tool name|] to |tool definition|.

</div>

<div algorithm>
The <dfn method for=ModelContext>unregisterTool(<var>name</var>)</dfn> method steps are:

1. Let |tool map| be [=this=]'s [=ModelContext/internal context=]'s [=model context/tool map=].

1. If |tool map|[|name|] does not [=map/exist=], then [=exception/throw=] an {{InvalidStateError}}
   {{DOMException}}.

1. [=map/Remove=] |tool map|[|name|].

</div>

<h4 id="model-context-tool">ModelContextTool Dictionary</h4>

The {{ModelContextTool}} dictionary describes a tool that can be invoked by [=agents=].

<xmp class="idl">
dictionary ModelContextTool {
  required DOMString name;
  required DOMString description;
  object inputSchema;
  required ToolExecuteCallback execute;
  ToolAnnotations annotations;
};

dictionary ToolAnnotations {
  boolean readOnlyHint = false;
};

callback ToolExecuteCallback = Promise<any> (object input, ModelContextClient client);
</xmp>

<dl class="domintro">
  <dt><code><var ignore>tool</var>["{{ModelContextTool/name}}"]</code></dt>
  <dd>
    <p>A unique identifier for the tool. This is used by [=agents=] to reference the tool when making tool calls.
  </dd>

  <dt><code><var ignore>tool</var>["{{ModelContextTool/description}}"]</code></dt>
  <dd>
    <p>A natural language description of the tool's functionality. This helps [=agents=] understand when and how to use the tool.
  </dd>

  <dt><code><var ignore>tool</var>["{{ModelContextTool/inputSchema}}"]</code></dt>
  <dd>
    <p>A JSON Schema [[!JSON-SCHEMA]] object describing the expected input parameters for the tool.
  </dd>

  <dt><code><var ignore>tool</var>["{{ModelContextTool/execute}}"]</code></dt>
  <dd>
    <p>A callback function that is invoked when an [=agent=] calls the tool. The function receives the input parameters and a {{ModelContextClient}} object.

    <p>The function can be asynchronous and return a promise, in which case the [=agent=] will receive the result once the promise is resolved.
  </dd>

  <dt><code><var ignore>tool</var>["{{ModelContextTool/annotations}}"]</code></dt>
  <dd>
    <p>Optional annotations providing additional metadata about the tool's behavior.
  </dd>
</dl>

The {{ToolAnnotations}} dictionary provides optional metadata about a tool:

<dl class="domintro" dfn-type=dict-member dfn-for=ToolAnnotations>
  : <dfn>readOnlyHint</dfn>
  :: If true, indicates that the tool does not modify any state and only reads data. This hint can help [=agents=] make decisions about when it is safe to call the tool.
</dl>

<h4 id="model-context-client">ModelContextClient Interface</h4>

The {{ModelContextClient}} interface represents an [=agent=] executing a tool provided by the site through the {{ModelContext}} API.

<xmp class="idl">
[Exposed=Window, SecureContext]
interface ModelContextClient {
  Promise<any> requestUserInteraction(UserInteractionCallback callback);
};

callback UserInteractionCallback = Promise<any> ();
</xmp>

<dl class="domintro">
  <dt><code><var ignore>client</var>.{{ModelContextClient/requestUserInteraction(callback)}}</code></dt>
  <dd>
    <p>Asynchronously requests user input during the execution of a tool.

    <p>The callback function is invoked to perform the user interaction (e.g., showing a confirmation dialog), and the promise resolves with the result of the callback.
  </dd>
</dl>

<div algorithm>
The <dfn method for=ModelContextClient>requestUserInteraction(<var ignore>callback</var>)</dfn> method steps are:

1. TODO: fill this out.

</div>

<h3 id="declarative-api">Declarative WebMCP</h3>

This section is entirely a TODO. For now, refer to the [explainer draft](https://github.com/webmachinelearning/webmcp/pull/76).

<div algorithm>
The <dfn>synthesize a declarative JSON Schema object algorithm</dfn>, given a <{form}> element
|form|, runs the following steps. They return a [=map=] representing a JSON Schema object.
[[!JSON-SCHEMA]]

1. TODO: Derive a conformant JSON Schema object from |form| and its [=form-associated elements=].

</div>

<pre class="biblio">
{
  "mcp": {
    "href": "https://modelcontextprotocol.io/specification/latest",
    "title": "Model Context Protocol (MCP) Specification",
    "publisher": "The Linux Foundation"
  },
  "json-schema": {
    "href": "https://json-schema.org/draft/2020-12/json-schema-core.html",
    "title": "JSON Schema: A Media Type for Describing JSON Documents",
    "publisher": "JSON Schema"
  }
}
</pre>

<h2 id="security-privacy">Security and privacy considerations</h2>

<!--
TODO: Reuse as applicable
https://github.com/webmachinelearning/webmcp/blob/main/docs/security-privacy-considerations.md
-->

<h2 id="accessibility">Accessibility considerations</h2>


<h2 id="acknowledgements">Acknowledgements</h2>

Thanks to
Brandon Walderman,
Leo Lee,
Andrew Nolan,
David Bokan,
Khushal Sagar,
Hannah Van Opstal,
Sushanth Rajasankar
for the initial explainer, proposals and discussions that established the foundation for this specification.

Also many thanks to Alex Nahas and Jason McGhee for sharing early implementation experience.

Finally, thanks to the participants of the Web Machine Learning Community Group for feedback and suggestions.