Skip to content

Add option to use latest directory and add run printing#1586

Open
c-hagem wants to merge 4 commits intoawslabs:mainfrom
c-hagem:autogroup-latest-option
Open

Add option to use latest directory and add run printing#1586
c-hagem wants to merge 4 commits intoawslabs:mainfrom
c-hagem:autogroup-latest-option

Conversation

@c-hagem
Copy link
Contributor

@c-hagem c-hagem commented Aug 28, 2025

Adds two features to the autogroup.py script, namely

  • the possibility to print either the numbers of all runs of a category (ordered descendingly by throughput), when specifying --runs=all, or just to print max, median and min run numbers --runs=rep (for representative)
  • makes default order of throughputs consistent with sorting order if runs are specified
  • adds possibility to use print the latest run, which is used automatically when no directory is specified

Only changes behaviour of the benchmarking autogroup script, so no Changelog entries / version bumps needed.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:26 — with GitHub Actions Failure
@c-hagem c-hagem force-pushed the autogroup-latest-option branch from 12e26af to b9172e4 Compare August 28, 2025 06:33
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 06:33 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 11:08 — with GitHub Actions Failure
@c-hagem c-hagem requested a review from sahityadg August 28, 2025 11:10
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 12:32 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 12:32 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 12:32 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 12:32 — with GitHub Actions Failure
@c-hagem c-hagem had a problem deploying to PR integration tests August 28, 2025 12:32 — with GitHub Actions Failure
sahityadg
sahityadg previously approved these changes Aug 28, 2025
def find_multirun_dir(index: int = 0) -> str:
"""Find the Nth latest directory in multirun (0=most recent, 1=previous, etc.)"""
if not Path('multirun').exists():
warnings.warn("multirun directory not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an exception here?

if not sorted_subdirs:
warnings.warn("No experiment directories found in multirun")
sys.exit(1)
return sorted_subdirs[index][1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above there are warnings, but if the index is out of range, this can raise an exception without a handy description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be caught as an IndexError, since we currently only use index 0 i think.


parser.add_argument('--csv-output', help='Optional CSV file to write the results to')
parser.add_argument(
'--runs', choices=['tri', 'all'], help='Show run numbers in results (tri=min/median/max, all=all runs)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also not heard of it. Wondering if there's a slightly less technical term which also works here


results_rows = []
for config_key, throughput_data in grouped_results.items():
throughputs = [t for t, _ in throughput_data]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: throughputs, run_numbers = zip(*throughput_data)

if args.runs == "tri":
# Find min, max, and median run numbers based on throughput
sorted_by_throughput = sorted(zip(throughputs, run_numbers))
min_run = sorted_by_throughput[0][1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why we're zipping after just unzipping above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


row.append(",".join(unique_runs))
else:
sorted_by_throughput = sorted(zip(throughputs, run_numbers), reverse=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we reverse sorting here but not above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we pick min, p50 and max , I guess we could use reverse sorting in both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted to reverse sort once


selected_runs = [max_run, median_run, min_run]
# Remove duplicates while preserving order
unique_runs = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large is this list realistically getting? This approach is O(n^2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list only has.3 elements (i.e. selected_runs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to faster method

Adds two features to the autogroup.py script, namely
 - the possibility to print either the numbers of all runs of a category (ordered descendingly by throughput),
   when specifying `--runs=all`, or just to print max, median and min run numbers `--runs=tri`
-  makes default order of throughputs consistent with sorting order if runs are specified
-  adds possibility to use print the latest run (by default latest is inferred from the run number, but can also be switched to modification time). Additionally, with `--latest=K` the k-th latest run acording to the specified order is picked.

Signed-off-by: Christian Hagemeier <chagem@amazon.com>
Signed-off-by: Christian Hagemeier <chagem@amazon.com>
Signed-off-by: Christian Hagemeier <chagem@amazon.com>
Signed-off-by: Christian Hagemeier <chagem@amazon.com>

# Add run numbers column if requested
if args.runs:
sorted_by_throughput = sorted(zip(throughputs, run_numbers), reverse=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't zip(throughputs, run_numbers) equivalent to throughput_data?

selected_runs = [max_run, median_run, min_run]
# Remove duplicates while preserving order using dict.fromkeys()
# (works in python > 3.7)
unique_runs = list(dict.fromkeys(selected_runs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your previous approach was more readable 😅
Perhaps just move it to a function instead?


row.append(",".join(unique_runs))
else:
all_runs = [r for _, r in sorted_by_throughput]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this run_numbers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants