feat(webhook): configurable queue selection for matching runners by guicaulada · Pull Request #5190 · github-aws-runners/terraform-aws-github-runner

guicaulada · 2026-06-29T16:56:39Z

Description

A workflow_job whose labels match several runner configurations is always dispatched to the first matching queue (after the exactMatch sort). When multiple pools intentionally share a generic label — e.g. an "any architecture" or "this size or larger" label spanning several runner configs — every cold scale-up funnels to a single queue, overloading one pool while equally-valid pools sit idle. There is currently no way to spread that load.

This adds a configurable queue selection strategy, applied to the equally-best matches (those sharing the top exactMatch priority tier):

first (default): unchanged — deterministic first match.
random: pick one uniformly, spreading jobs across the matching queues so a single pool's queue does not become a bottleneck.
all: dispatch to every matching queue — scaling up one runner per matching pool and letting the first available runner take the job (speed over cost). GitHub assigns the queued job to exactly one runner; the losers are reaped by scale-down.

exactMatch priority is preserved: random/all only ever operate within the highest-priority matching tier, never a lower-priority match. The strategy applies to standard jobs; dynamic (ghr-) label jobs continue to use the first compliant queue.

Caveats for `all` (deliberate opt-in)

Multiplies instance launches per job (losers idle until scale-down's minimum_running_time_in_minutes).
Multiplies runner registrations per job, increasing GitHub API usage — relevant where API rate limits are already a concern.
Only truly races when enable_job_queued_check = false (otherwise later scale-ups see the job already taken and skip).

Changes

Lambda — new QUEUE_SELECTION_STRATEGY env var (validated; defaults to first), read by both the direct webhook and the EventBridge dispatcher; selectQueues() implements first/random/all within the top-priority matching tier.

Terraform — a queue_selection_strategy variable (validated first/random/all, default first) on the root and multi-runner modules, threaded through the webhook module config into the direct/eventbridge lambda env var, plus regenerated terraform-docs.

RFC note: Per CONTRIBUTING (discuss major changes first), open questions for maintainers — happy to adjust:

global setting (as implemented) vs. per-runner-config option?

naming (queue_selection_strategy; values first/random/all)?

should all (and random) extend to the dynamic-label path?

Test Plan

Added unit tests in dispatch.test.ts: default picks first; random spreads across equally-matching queues (Math.random mocked); random preserves exactMatch priority; all dispatches to every top-tier match but not lower-priority ones; invalid strategy rejected at config load.
yarn test (webhook): 40/40 pass. yarn build (ncc typecheck) passes. ESLint: 0 errors. Prettier: clean.
terraform fmt -check and terraform validate pass on the root and multi-runner modules; terraform-docs regenerated.

Related Issues

Motivation is load distribution across pools that share generic labels (avoiding single-queue hotspots), and a speed-over-cost option for large-scale environments. No existing upstream issue — happy to open one to track the discussion if preferred.

github-actions · 2026-06-29T16:57:03Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

A workflow_job whose labels match several runner configs is always dispatched to the first matching queue (after the exactMatch sort). When multiple pools share a generic label (e.g. an "any architecture" label), every cold scale-up funnels to a single queue, overloading one pool while equally-valid pools sit idle. Add a queue selection strategy applied to the equally-best matches (those sharing the top exactMatch priority tier): - `first` (default): unchanged, deterministic first match. - `random`: pick one uniformly, spreading jobs across the matching queues. - `all`: dispatch to every matching queue, scaling up one runner per pool and letting the first available take the job (speed over cost). This multiplies instance launches and runner registrations per job. exactMatch priority is preserved — random/all never select a lower-priority match. Configured via a new QUEUE_SELECTION_STRATEGY env var (validated; defaults to `first`), read by the direct webhook and EventBridge dispatcher. The strategy applies to standard jobs; dynamic (ghr-) label jobs continue to use the first compliant queue.

Expose the queue_selection_strategy lambda setting as a public Terraform variable on the root, multi-runner and webhook modules, validated to first/random/all. Thread it through to both the direct webhook and the eventbridge dispatcher lambdas via the QUEUE_SELECTION_STRATEGY env var so the dispatch behaviour added in the previous commit is configurable.

guicaulada requested a review from a team as a code owner June 29, 2026 16:56

guicaulada force-pushed the feat/webhook-queue-selection-strategy branch from b1a0731 to 95aaac3 Compare June 29, 2026 19:16

guicaulada requested a review from a team as a code owner June 29, 2026 20:27

docs: auto update terraform docs

68b354d

guicaulada changed the title ~~feat(webhook): optional random queue selection for matching runners~~ feat(webhook): configurable queue selection for matching runners Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(webhook): configurable queue selection for matching runners#5190

feat(webhook): configurable queue selection for matching runners#5190
guicaulada wants to merge 3 commits into
mainfrom
feat/webhook-queue-selection-strategy

guicaulada commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

guicaulada commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Caveats for all (deliberate opt-in)

Changes

Test Plan

Related Issues

Uh oh!

github-actions Bot commented Jun 29, 2026

Dependency Review

Scanned Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

guicaulada commented Jun 29, 2026 •

edited

Loading

Caveats for `all` (deliberate opt-in)