fix(opentelemetry): recreate tracer object after plugin metadata changed#13618
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to fix stale OpenTelemetry tracer configuration when plugin_metadata is updated at runtime by ensuring cached tracer objects are recreated when the metadata changes.
Changes:
- Updates the
core.lrucache.plugin_ctxkeying inopentelemetry.rewriteto incorporateplugin_metadata.modifiedIndex. - Adds an inline comment explaining the metadata-based cache keying rationale.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -- key the cache on modifiedIndex so the tracer is rebuilt when metadata changes | ||
| local tracer, err = core.lrucache.plugin_ctx(lrucache, api_ctx, metadata.modifiedIndex, | ||
| create_tracer_obj, conf, plugin_info) |
There was a problem hiding this comment.
Good catch. Fixed in the latest commit: inject_core_spans now keys the same lrucache on metadata.modifiedIndex too, so the injected core spans are rebuilt on a metadata change instead of being left stale alongside the main span.
| -- key the cache on modifiedIndex so the tracer is rebuilt when metadata changes | ||
| local tracer, err = core.lrucache.plugin_ctx(lrucache, api_ctx, metadata.modifiedIndex, | ||
| create_tracer_obj, conf, plugin_info) |
There was a problem hiding this comment.
I'd leave this as-is. modifiedIndex is always populated for plugin_metadata across every config source: config_etcd carries it from etcd, and config_yaml falls back to the global conf_version (config_yaml.lua: local modifiedIndex = item.modifiedIndex or conf_version). So there is no real path where it is nil here. And even in the hypothetical case it were nil, the cache key would just degrade to the previous master behavior (the nil it already passes today), so it is strictly no worse than the status quo, not a new regression. Adding a validation + tostring() would imply a failure mode that cannot occur.
Description
The
opentelemetryplugin caches the tracer object per route viacore.lrucache.plugin_ctx, but passednilas the cache version key in_M.rewrite. The tracer bakes in everything sourced fromplugin_metadata(collector address,request_timeout/request_headers,trace_id_source, sampler, and resource attributes). With anilversion key the cache entry is never invalidated, so after an operator updates theopentelemetryplugin_metadata at runtime (e.g. points to a new OTLP collector or flipstrace_id_source), the worker keeps serving the stale cached tracer. Spans keep going to the old endpoint until the per-conf cache entry is evicted or the worker recycles.This passes
metadata.modifiedIndexas the lrucache version key so the tracer is rebuilt whenever the plugin_metadata changes.api_ctx.conf_version(the existing version) only tracks the route/service, not plugin_metadata, so a metadata-only change does not invalidate the entry on its own.Fixes the stale-tracer behavior on plugin_metadata change.
Checklist