Skip to content

fix(healthcheck): purge stale targets and release periodic lock (bump lua-resty-healthcheck-api7 to 3.2.2)#13627

Open
AlinsRan wants to merge 3 commits into
apache:masterfrom
AlinsRan:fix/healthcheck-stale-cleanup-test
Open

fix(healthcheck): purge stale targets and release periodic lock (bump lua-resty-healthcheck-api7 to 3.2.2)#13627
AlinsRan wants to merge 3 commits into
apache:masterfrom
AlinsRan:fix/healthcheck-stale-cleanup-test

Conversation

@AlinsRan

@AlinsRan AlinsRan commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes a cluster of active-health-check bugs whose root cause is in the health-check library, by bumping lua-resty-healthcheck-api7 from 3.2.1-0 to 3.2.2-0 (which carries api7/lua-resty-healthcheck#55) and adding a regression test.

apisix already marks removed targets via delayed_clear() and re-creates checkers on enable; the library was the layer failing to act on that:

This is a dependency upgrade only — no apisix production code changes are required, because the fix belongs in the library.

Tests

t/node/healthcheck-stale-cleanup.t sets up two health-checked upstreams (each with two nodes), lets both checkers register their targets, then drops one node and changes the checks config on each upstream so the manager rebuilds the checker and delayed_clear()s the old target. It asserts, via /v1/healthcheck:

A node present in a checker's target list is being actively probed, so the single /v1/healthcheck observation covers #13385 and #13141 together. The bug reproduces only with multiple upstreams (a single-upstream setup is always the "first" checker and cleans), which is why earlier single-upstream repros failed.

Verified locally against both library versions:

lua-resty-healthcheck-api7 result
3.2.1-0 (before) FAILprobed_before: 2 / stale_after: 1 (the second checker keeps its removed node)
3.2.2-0 (this PR) PASSprobed_before: 2 / stale_after: 0

#13235 (periodic-lock release) is intentionally not tested here: it is a multi-worker lock-contention scenario that cannot be reproduced deterministically in apisix's single-worker test harness without becoming flaky. It is covered at the correct layer by the library's own regression test 20-periodic-lock-release.t in api7/lua-resty-healthcheck#55.

Scope / merge order

Fixes #13385, #13141, #13235.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

Bump lua-resty-healthcheck-api7 from 3.2.1-0 to 3.2.2-0, which carries the
stale-target cleanup and periodic-lock fixes (api7/lua-resty-healthcheck#55).

Before the bump, the health-check library advanced a module-level cleanup
timestamp inside its per-checker loop, so only the first checker was purged
each window. With multiple health-checked upstreams the others kept their
delayed_clear()-marked nodes forever -- still reported by the control API
(apache#13385) and still actively probed (apache#13141).

Add t/node/healthcheck-stale-cleanup.t: two health-checked upstreams each
drop a node; the test asserts both removed nodes are gone from
/v1/healthcheck. It fails on 3.2.1-0 (one stale node remains) and passes on
3.2.2-0. A single-upstream setup cannot reproduce the bug because the sole
checker is always the "first" one cleaned.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working dependencies Pull requests that update a dependency file labels Jun 29, 2026
@AlinsRan AlinsRan marked this pull request as draft June 29, 2026 23:58
@AlinsRan AlinsRan changed the title fix(healthcheck): purge stale targets for every checker (bump lua-resty-healthcheck-api7 to 3.2.2) fix(healthcheck): purge stale targets and release periodic lock (bump lua-resty-healthcheck-api7 to 3.2.2) Jun 30, 2026
@AlinsRan AlinsRan marked this pull request as ready for review June 30, 2026 05:50

@membphis membphis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dependencies Pull requests that update a dependency file size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: After a node is deleted from the upstream, its information can still be queried from the health check interface.

4 participants