fix(auto-routing): reduce classifier errors and latency#3942
Merged
Conversation
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Executive SummaryThe latest commit removes the Files Reviewed (incremental — 1 new commit ea61939)
Previously reviewed files (no changes, findings carried forward)
Reviewed by claude-4.6-sonnet-20260217 · 287,717 tokens Review guidance: REVIEW.md from base branch |
pandemicsyn
approved these changes
Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improves the auto-routing classifier by switching to the lower-latency
google/gemini-2.5-flash-litedefault, shrinking the classifier prompt, and capping classifier completions at 160 tokens.Moves classifier output handling into
services/auto-routing/src/classifier-output/with an isolated, tested parser. The parser accepts common model-output drift such as fenced JSON, wrappers, enum labels, snake_case keys, confidence strings, and subtype/task mismatches. If the model returns unusable output, the worker now emits a valid low-confidence fallback classification and logsauto_routing_classifier_fallbackseparately from classifier errors. Classified successes do not emit custom success logs.Adds admin analytics for task/subtask pairs and records classifier failures as
classifier_error:<subtype>statuses in Analytics Engine, so the admin status breakdown can show the distribution of classifier failure modes.Verification
Production Axiom check for deployed version
358ed06e-7b68-4415-aa2d-3324adc7cce0over a 25-minute window before success logs were removed: 35,961 invocations, 0 classifier error logs, 235 fallback logs, and 332 sampled success latency logs. Classifier error rate was 0%; sampled success p95 latency was 1212.19ms, with p99 at 1697.55ms.Fallbacks were mostly unusable model outputs with unrelated keys such as
subcategory,minecraft, andselectedTickers; these are now visible through the separate fallback log event and produce confidence0classifications.Visual Changes
Admin Auto Routing panel now includes a Task Subtypes breakdown table. Screenshot not captured because the admin panel requires live admin auth.
Reviewer Notes
The preferred p95 target was under 1000ms, but the production sample landed at 1212.19ms. This is below the acceptable 2000ms p95 target while meeting the stricter reliability target.
Remote production KV
classifier_modelis set togoogle/gemini-2.5-flash-lite, matching the code default.The classifier error summary query counts both historical
classifier_errorrows and newclassifier_error:<subtype>rows, while the status breakdown preserves the subtype for distribution analysis.