fix(healthcheck): purge stale targets and release periodic lock (bump lua-resty-healthcheck-api7 to 3.2.2)#13627
Open
AlinsRan wants to merge 3 commits into
Open
Conversation
Bump lua-resty-healthcheck-api7 from 3.2.1-0 to 3.2.2-0, which carries the stale-target cleanup and periodic-lock fixes (api7/lua-resty-healthcheck#55). Before the bump, the health-check library advanced a module-level cleanup timestamp inside its per-checker loop, so only the first checker was purged each window. With multiple health-checked upstreams the others kept their delayed_clear()-marked nodes forever -- still reported by the control API (apache#13385) and still actively probed (apache#13141). Add t/node/healthcheck-stale-cleanup.t: two health-checked upstreams each drop a node; the test asserts both removed nodes are gone from /v1/healthcheck. It fails on 3.2.1-0 (one stale node remains) and passes on 3.2.2-0. A single-upstream setup cannot reproduce the bug because the sole checker is always the "first" one cleaned.
5 tasks
5 tasks
… independent of the node-only incremental-update path (apache#13629)
shreemaan-abhishek
approved these changes
Jun 30, 2026
nic-6443
approved these changes
Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a cluster of active-health-check bugs whose root cause is in the health-check library, by bumping
lua-resty-healthcheck-api7from3.2.1-0to3.2.2-0(which carries api7/lua-resty-healthcheck#55) and adding a regression test.apisix already marks removed targets via
delayed_clear()and re-creates checkers on enable; the library was the layer failing to act on that:/v1/healthcheck.Both share one cause: the library advanced a module-level cleanup timestamp inside its per-checker loop, so only the first checker was purged each window. With multiple health-checked upstreams the rest kept their
delayed_clear()-marked nodes forever. (doc: limited the picture size if it's small #55 elects one cleanup worker per window and purges every checker; it also decouples cleanup from the periodic lock so passive-only deployments are cleaned too.)This is a dependency upgrade only — no apisix production code changes are required, because the fix belongs in the library.
Tests
t/node/healthcheck-stale-cleanup.tsets up two health-checked upstreams (each with two nodes), lets both checkers register their targets, then drops one node and changes thechecksconfig on each upstream so the manager rebuilds the checker anddelayed_clear()s the old target. It asserts, via/v1/healthcheck:probed_before: 2— before removal both soon-to-be-dropped nodes are in the checkers' target lists, i.e. actively probed (covers bug: health check keeps probing stale upstream nodes after nodes update (checker cache not invalidated) #13141).stale_after: 0— after the cleanup window the dropped nodes are gone from every checker, so they can be neither queried via the control API (bug: After a node is deleted from the upstream, its information can still be queried from the health check interface. #13385) nor probed (bug: health check keeps probing stale upstream nodes after nodes update (checker cache not invalidated) #13141).A node present in a checker's target list is being actively probed, so the single
/v1/healthcheckobservation covers #13385 and #13141 together. The bug reproduces only with multiple upstreams (a single-upstream setup is always the "first" checker and cleans), which is why earlier single-upstream repros failed.Verified locally against both library versions:
3.2.1-0(before)probed_before: 2/stale_after: 1(the second checker keeps its removed node)3.2.2-0(this PR)probed_before: 2/stale_after: 0#13235 (periodic-lock release) is intentionally not tested here: it is a multi-worker lock-contention scenario that cannot be reproduced deterministically in apisix's single-worker test harness without becoming flaky. It is covered at the correct layer by the library's own regression test
20-periodic-lock-release.tin api7/lua-resty-healthcheck#55.Scope / merge order
v3.2.2, and3.2.2-0is published on luarocks, somake depsresolves and CI is green.Fixes #13385, #13141, #13235.
Checklist