Skip to content

fix(opentelemetry): recreate tracer object after plugin metadata changed#13618

Merged
shreemaan-abhishek merged 2 commits into
apache:masterfrom
shreemaan-abhishek:fix/opentelemetry-recreate-tracer-on-metadata-change
Jun 30, 2026
Merged

fix(opentelemetry): recreate tracer object after plugin metadata changed#13618
shreemaan-abhishek merged 2 commits into
apache:masterfrom
shreemaan-abhishek:fix/opentelemetry-recreate-tracer-on-metadata-change

Conversation

@shreemaan-abhishek

Copy link
Copy Markdown
Contributor

Description

The opentelemetry plugin caches the tracer object per route via core.lrucache.plugin_ctx, but passed nil as the cache version key in _M.rewrite. The tracer bakes in everything sourced from plugin_metadata (collector address, request_timeout/request_headers, trace_id_source, sampler, and resource attributes). With a nil version key the cache entry is never invalidated, so after an operator updates the opentelemetry plugin_metadata at runtime (e.g. points to a new OTLP collector or flips trace_id_source), the worker keeps serving the stale cached tracer. Spans keep going to the old endpoint until the per-conf cache entry is evicted or the worker recycles.

This passes metadata.modifiedIndex as the lrucache version key so the tracer is rebuilt whenever the plugin_metadata changes. api_ctx.conf_version (the existing version) only tracks the route/service, not plugin_metadata, so a metadata-only change does not invalidate the entry on its own.

Fixes the stale-tracer behavior on plugin_metadata change.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change (N/A, behavior-internal)
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels Jun 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix stale OpenTelemetry tracer configuration when plugin_metadata is updated at runtime by ensuring cached tracer objects are recreated when the metadata changes.

Changes:

  • Updates the core.lrucache.plugin_ctx keying in opentelemetry.rewrite to incorporate plugin_metadata.modifiedIndex.
  • Adds an inline comment explaining the metadata-based cache keying rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +344 to 346
-- key the cache on modifiedIndex so the tracer is rebuilt when metadata changes
local tracer, err = core.lrucache.plugin_ctx(lrucache, api_ctx, metadata.modifiedIndex,
create_tracer_obj, conf, plugin_info)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in the latest commit: inject_core_spans now keys the same lrucache on metadata.modifiedIndex too, so the injected core spans are rebuilt on a metadata change instead of being left stale alongside the main span.

Comment on lines +344 to 346
-- key the cache on modifiedIndex so the tracer is rebuilt when metadata changes
local tracer, err = core.lrucache.plugin_ctx(lrucache, api_ctx, metadata.modifiedIndex,
create_tracer_obj, conf, plugin_info)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this as-is. modifiedIndex is always populated for plugin_metadata across every config source: config_etcd carries it from etcd, and config_yaml falls back to the global conf_version (config_yaml.lua: local modifiedIndex = item.modifiedIndex or conf_version). So there is no real path where it is nil here. And even in the hypothetical case it were nil, the cache key would just degrade to the previous master behavior (the nil it already passes today), so it is strictly no worse than the status quo, not a new regression. Adding a validation + tostring() would imply a failure mode that cannot occur.

Comment thread apisix/plugins/opentelemetry.lua
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jun 29, 2026
@shreemaan-abhishek shreemaan-abhishek merged commit 26a3472 into apache:master Jun 30, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants