.NET: Add checkpoint on super step started event: issue #4280#4604
.NET: Add checkpoint on super step started event: issue #4280#4604elgold92 wants to merge 16 commits intomicrosoft:mainfrom
Conversation
…ely b/c the test definitions are now wrong
…c definition. I think this is correct, but the test now hangs, likely because the first checkpoint (or it's related data structures) we're now creating aren't initialized correctly.
…ShouldPersist_CheckpointedAsync test definition
There was a problem hiding this comment.
Pull request overview
Adds support for creating/checkpointing workflow state at the SuperStepStarted boundary so runs can resume from “pre-delivery” checkpoints (addressing #4280), and updates tests accordingly.
Changes:
- Add a
CheckpointInfo?field toSuperStepStartInfoand populate it onSuperStepStartedEvent. - Create a checkpoint at the start of each superstep (capturing pre-delivery queued messages) by extending runner state export to accept an override
StepContext. - Update and expand unit tests to account for the additional checkpoints and validate parent chaining/resume behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/InProcessStateTests.cs | Updates expected checkpoint count due to start+end checkpointing per superstep. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs | Extends tests to include checkpoints emitted on SuperStepStartedEvent and adds new resume/count assertions. |
| dotnet/src/Microsoft.Agents.AI.Workflows/SuperStepStartInfo.cs | Adds Checkpoint property to expose the checkpoint emitted at superstep start. |
| dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessRunnerContext.cs | Allows exporting runner state from an alternate StepContext (pre-delivery snapshot). |
| dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessRunner.cs | Saves a checkpoint before superstep execution and wires it into the started event. |
| dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcStepTracer.cs | Plumbs the start-checkpoint into SuperStepStartedEvent payload. |
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
…tParentTests.cs rename local variable Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread. Changes look to be generally minor improvements. |
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
| // Save a checkpoint before the superstep executes, capturing the pre-delivery state. | ||
| await this.CheckpointAsync(currentStep, cancellationToken).ConfigureAwait(false); | ||
| CheckpointInfo? startCheckpoint = this.StepTracer.Checkpoint; | ||
|
|
||
| await this.RaiseWorkflowEventAsync(this.StepTracer.Advance(currentStep, startCheckpoint)).ConfigureAwait(false); | ||
|
|
There was a problem hiding this comment.
The checkpoint created at the beginning of RunSuperstepAsync is stamped using StepTracer.StepNumber, but StepTracer.Advance(...) increments the step number afterward. This means the StartInfo.Checkpoint metadata will generally be associated with the previous step number (and the first start checkpoint is -1/IsInitial), which can be surprising when correlating checkpoints to SuperStepStartedEvent.StepNumber. Consider advancing the step counter before creating the start-of-step checkpoint, or allowing CheckpointAsync to accept an explicit step number to use for start checkpoints.
| // Save a checkpoint before the superstep executes, capturing the pre-delivery state. | |
| await this.CheckpointAsync(currentStep, cancellationToken).ConfigureAwait(false); | |
| CheckpointInfo? startCheckpoint = this.StepTracer.Checkpoint; | |
| await this.RaiseWorkflowEventAsync(this.StepTracer.Advance(currentStep, startCheckpoint)).ConfigureAwait(false); | |
| // Capture the checkpoint from the previous step (if any) to correlate with the step-start event. | |
| CheckpointInfo? previousCheckpoint = this.StepTracer.Checkpoint; | |
| await this.RaiseWorkflowEventAsync(this.StepTracer.Advance(currentStep, previousCheckpoint)).ConfigureAwait(false); | |
| // Save a checkpoint at the beginning of the superstep, capturing the pre-delivery state for this step. | |
| await this.CheckpointAsync(currentStep, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
This is a good callout from Copilot. I'm not sure what the intended design and usage of the StepNumber field is. I think initilizing the field to -1 is itself a bit questionable, it might have made more sense to add an additional member in the class that captures the concept that flag is using. But I'm not familiar enough with the design to be sure.
There was a problem hiding this comment.
As expected, merely applying the change suggested by Copilot will cause all the CheckpointParentTests unit tests to fail. This concern is a bit nuanced and I'd appreciate if a project admin or someone more familiar with the code review this.
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointParentTests.cs
Show resolved
Hide resolved
…tParentTests.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation and Context
Address issue #4280, allowing workflows to resume from checkpoints saved from SuperStepStarted events.
Description
Adds
CheckpointInfo?field to theSuperStepStartInfoclass, populating this information in the InProcessRunner and InProcStepTracer. Also updates associated unit tests to expect more checkpoints to be created on checkpointed workflows.