Skip to content

.NET: Add checkpoint on super step started event: issue #4280#4604

Open
elgold92 wants to merge 16 commits intomicrosoft:mainfrom
elgold92:ericgold/CheckpointOnSuperStepStarted
Open

.NET: Add checkpoint on super step started event: issue #4280#4604
elgold92 wants to merge 16 commits intomicrosoft:mainfrom
elgold92:ericgold/CheckpointOnSuperStepStarted

Conversation

@elgold92
Copy link

Motivation and Context

Address issue #4280, allowing workflows to resume from checkpoints saved from SuperStepStarted events.

Description

Adds CheckpointInfo? field to the SuperStepStartInfo class, populating this information in the InProcessRunner and InProcStepTracer. Also updates associated unit tests to expect more checkpoints to be created on checkpointed workflows.

Eric Gold added 7 commits March 9, 2026 16:53
Copilot AI review requested due to automatic review settings March 10, 2026 20:11
@markwallace-microsoft markwallace-microsoft added .NET workflows Related to Workflows in agent-framework labels Mar 10, 2026
@github-actions github-actions bot changed the title Add checkpoint on super step started event: issue #4280 .NET: Add checkpoint on super step started event: issue #4280 Mar 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for creating/checkpointing workflow state at the SuperStepStarted boundary so runs can resume from “pre-delivery” checkpoints (addressing #4280), and updates tests accordingly.

Changes:

  • Add a CheckpointInfo? field to SuperStepStartInfo and populate it on SuperStepStartedEvent.
  • Create a checkpoint at the start of each superstep (capturing pre-delivery queued messages) by extending runner state export to accept an override StepContext.
  • Update and expand unit tests to account for the additional checkpoints and validate parent chaining/resume behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/InProcessStateTests.cs Updates expected checkpoint count due to start+end checkpointing per superstep.
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs Extends tests to include checkpoints emitted on SuperStepStartedEvent and adds new resume/count assertions.
dotnet/src/Microsoft.Agents.AI.Workflows/SuperStepStartInfo.cs Adds Checkpoint property to expose the checkpoint emitted at superstep start.
dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessRunnerContext.cs Allows exporting runner state from an alternate StepContext (pre-delivery snapshot).
dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessRunner.cs Saves a checkpoint before superstep execution and wires it into the started event.
dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcStepTracer.cs Plumbs the start-checkpoint into SuperStepStartedEvent payload.

…tParentTests.cs


rename local variable

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 10, 2026 21:07
@elgold92
Copy link
Author

@copilot open a new pull request to apply changes based on the comments in this thread. Changes look to be generally minor improvements.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings March 10, 2026 23:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines +247 to 252
// Save a checkpoint before the superstep executes, capturing the pre-delivery state.
await this.CheckpointAsync(currentStep, cancellationToken).ConfigureAwait(false);
CheckpointInfo? startCheckpoint = this.StepTracer.Checkpoint;

await this.RaiseWorkflowEventAsync(this.StepTracer.Advance(currentStep, startCheckpoint)).ConfigureAwait(false);

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkpoint created at the beginning of RunSuperstepAsync is stamped using StepTracer.StepNumber, but StepTracer.Advance(...) increments the step number afterward. This means the StartInfo.Checkpoint metadata will generally be associated with the previous step number (and the first start checkpoint is -1/IsInitial), which can be surprising when correlating checkpoints to SuperStepStartedEvent.StepNumber. Consider advancing the step counter before creating the start-of-step checkpoint, or allowing CheckpointAsync to accept an explicit step number to use for start checkpoints.

Suggested change
// Save a checkpoint before the superstep executes, capturing the pre-delivery state.
await this.CheckpointAsync(currentStep, cancellationToken).ConfigureAwait(false);
CheckpointInfo? startCheckpoint = this.StepTracer.Checkpoint;
await this.RaiseWorkflowEventAsync(this.StepTracer.Advance(currentStep, startCheckpoint)).ConfigureAwait(false);
// Capture the checkpoint from the previous step (if any) to correlate with the step-start event.
CheckpointInfo? previousCheckpoint = this.StepTracer.Checkpoint;
await this.RaiseWorkflowEventAsync(this.StepTracer.Advance(currentStep, previousCheckpoint)).ConfigureAwait(false);
// Save a checkpoint at the beginning of the superstep, capturing the pre-delivery state for this step.
await this.CheckpointAsync(currentStep, cancellationToken).ConfigureAwait(false);

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout from Copilot. I'm not sure what the intended design and usage of the StepNumber field is. I think initilizing the field to -1 is itself a bit questionable, it might have made more sense to add an additional member in the class that captures the concept that flag is using. But I'm not familiar enough with the design to be sure.

Copy link
Author

@elgold92 elgold92 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected, merely applying the change suggested by Copilot will cause all the CheckpointParentTests unit tests to fail. This concern is a bit nuanced and I'd appreciate if a project admin or someone more familiar with the code review this.

Copilot AI review requested due to automatic review settings March 11, 2026 00:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

…tParentTests.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 04:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET workflows Related to Workflows in agent-framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants