Skip to content

[Bug] Parallel iteration inflates timing metricsย #33214

@BeautyyuYanli

Description

@BeautyyuYanli

Summary

Parallel iteration has two timing distortions in backend tracing:

  1. child node elapsed time can be inflated because node finish events are buffered and replayed later
  2. iteration_duration_map can be inflated because per-iteration duration is computed after buffered events are replayed

Reproduction

Run an iteration in parallel mode with an LLM node inside each iteration, especially when the iteration body emits buffered events or the consumer is slow.

Expected

  • node-level elapsed time reflects the node's actual completion time
  • iteration_duration_map reflects the iteration's actual worker runtime

Actual

  • node elapsed time can include delayed replay time
  • iteration_duration_map can converge toward similar values across parallel iterations because replay delay is included

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions