Skip to content

fix: CRD update mutations return 400 on JSON-encoded gRPC path#1279

Open
craigmarker wants to merge 10 commits into
mainfrom
craig.marker/fix-trigger-run-unmarshal-json
Open

fix: CRD update mutations return 400 on JSON-encoded gRPC path#1279
craigmarker wants to merge 10 commits into
mainfrom
craig.marker/fix-trigger-run-unmarshal-json

Conversation

@craigmarker
Copy link
Copy Markdown
Contributor

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

What changed?

Add UnmarshalJSONPB and MarshalJSON to the top-level CRD struct in crd.tmpl (affecting all generated CRD types) and to TriggerRun specifically. Add CollectInt64Fields and UnquoteInt64Fields to kubeproto/util for pre-processing proto3 canonical JSON before delegating to encoding/json.

Why?

All CRD update mutations (UpdateTriggerRun, UpdatePipeline, etc.) returned 400 Bad Request when called through the browser UI. The OSS UI uses connect-es with createConnectTransport, which routes requests through Envoy's connect_grpc_bridge filter as JSON-encoded gRPC. YARPC uses gogo's jsonpb to decode these requests.

Two distinct failures:

  1. Unmarshal 400: gogo's jsonpb v1.3.2 dispatches to JSONPBUnmarshaler (not json.Unmarshaler) on nested message fields. Without UnmarshalJSONPB on TriggerRun, gogo tried to decode metav1.Time (embedded in ObjectMeta) itself and failed — proto3 canonical JSON encodes timestamps as RFC3339 strings, but gogo expects {"seconds": N, "nanos": N}. Additionally, proto3 encodes int64/uint64 as quoted strings that encoding/json can't decode without pre-processing. UnmarshalJSONPB pre-processes both cases and then delegates to encoding/json via a type alias to avoid infinite recursion.

  2. Marshal / Kubernetes write failure: The pre-existing jsonpb-based MarshalJSON on CRD types did not respect json:",inline" on TypeMeta, producing {"typeMeta": {"kind": "TriggerRun", ...}}. The Kubernetes API server requires kind and apiVersion at the root level, so every update mutation that made it past deserialization would fail when controller-runtime serialized the object for the PUT request. Replacing MarshalJSON with an encoding/json + type-alias approach fixes this, while nested Spec/Status fields continue to use their own jsonpb-based MarshalJSON implementations.

How did you test it?

I ran a unit test (trigger_run_json_test.go) that calls jsonpb.Unmarshal directly on UpdateTriggerRunRequest with an RFC3339 creationTimestamp — the exact codepath YARPC takes. The test fails on main and passes with the fix.

I confirmed end-to-end against a local sandbox:

  • Sent UpdateTriggerRun via Connect+JSON through Envoy using a Python script; apiserver logs confirmed "encoding":"json" and the request reaching the handler without error. The action was persisted in Kubernetes.
  • Opened the local dev server (which uses createConnectTransport → Connect+JSON encoding), created a running cron trigger via ma sandbox demo pipeline, and clicked Kill in the UI. Devtools showed UpdateTriggerRun returning 200.

Potential risks

MarshalJSON on CRD types now produces kind/apiVersion inline at the root rather than nested under typeMeta. This is required by Kubernetes but changes the JSON shape in GET/LIST responses served over the JSON encoding path. The connect-es TypeScript client does not consume typeMeta from responses so UI behavior is unaffected. Any external consumer reading typeMeta from the JSON API would need to adapt.

Release notes

Fix: UpdateTriggerRun and all other CRD update mutations returned 400 when called through the browser UI. The OSS UI uses JSON-encoded gRPC via Envoy, which was incompatible with how gogo's jsonpb decoded metav1.Time timestamps and how it serialized TypeMeta for Kubernetes writes.

Documentation Changes

None.

craigmarker and others added 5 commits June 2, 2026 16:11
gogo jsonpb (used by YARPC's JSON codec) handles google.protobuf.Timestamp
as a well-known type but does not recognize metav1.Time, which also uses
RFC3339 string encoding. When the JSON body contains creationTimestamp or
managedFields[*].time, gogo falls through to its generic struct handler
which expects a JSON object — failing with "cannot unmarshal string into
Go value of type map[string]json.RawMessage".

TriggerRunSpec and TriggerRunStatus already had this fallback pattern:
try jsonpb first, and on the specific error fall back to encoding/json,
which correctly dispatches to metav1.Time.UnmarshalJSON. TriggerRun itself
was missing these methods, so the error propagated to YARPC, causing
UpdateTriggerRun to return 400 on any request that included ObjectMeta.

Add MarshalJSON/UnmarshalJSON to TriggerRun following the same pattern,
and fix the crd.tmpl generator template so all future CRD types get them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit added UnmarshalJSON, but gogo's jsonpb v1.3.2 does
not check for json.Unmarshaler on nested message fields — it only
dispatches to its own JSONPBUnmarshaler interface (UnmarshalJSONPB).
Since YARPC uses gogo's jsonpb to decode JSON-encoded gRPC requests,
the UnmarshalJSON method was never called for the TriggerRun field
inside UpdateTriggerRunRequest.

Add UnmarshalJSONPB to the top-level CRD type, which goes straight to
encoding/json (bypassing jsonpb to avoid infinite recursion). This is
the method gogo actually calls when processing nested messages.

The test sends an UpdateTriggerRunRequest through jsonpb.Unmarshal with
an RFC3339 creationTimestamp — the same codepath YARPC takes when the
Connect/gRPC-web bridge forwards a JSON-encoded request from the UI.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Add UnmarshalJSONPB to TriggerRun (and crd.tmpl for all generated CRD
types) so gogo's jsonpb dispatches to our handler instead of trying to
decode metav1.ObjectMeta fields itself.

gogo v1.3.2 does not know about metav1.Time and fails when it receives
an RFC3339 string for creationTimestamp — it expects a JSON object with
seconds/nanos. This error surfaces as a 400 on all CRD update mutations
called through the browser UI, which goes through Envoy's
connect_grpc_bridge filter and sends JSON-encoded gRPC.

The handler pre-processes int64/uint64 fields (proto3 canonical JSON
encodes these as quoted strings that encoding/json cannot decode) and
then delegates to encoding/json via a type alias to break the dispatch
cycle. metav1.Time is handled natively by metav1.Time.UnmarshalJSON when
encoding/json is the decoder — no conversion step needed.

Add CollectInt64Fields and UnquoteInt64Fields to kubeproto/util and wire
them into both the generated TriggerRun type and the crd.tmpl template.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Without this fix, UpdateTriggerRun succeeded JSON decoding but then
failed when controller-runtime serialized the object to PUT to the
Kubernetes API server. The Kubernetes runtime codec calls json.Marshal
which was delegated to our MarshalJSON using jsonpb. jsonpb does not
handle json:",inline" on TypeMeta, so it produced {"typeMeta":{"kind":
"TriggerRun",...}} — Kubernetes requires "kind" at the root level.

Switch MarshalJSON to use encoding/json with a type alias (same
pattern as UnmarshalJSONPB). encoding/json respects json:",inline",
flattening kind/apiVersion to the root. Nested Spec/Status still
delegate to their own MarshalJSON implementations (jsonpb), preserving
proto3 field names and enum strings for those fields.

Update crd.tmpl so all generated CRD types get the same fix.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Commit 51f759c was intended to swap the jsonpb-based MarshalJSON for an
encoding/json version, but it accidentally reverted the entire block
(UnmarshalJSONPB + MarshalJSON + UnmarshalJSON) in trigger_run.pb.go and
reset crd.tmpl to main, leaving the test without a fix and the util.go
helpers as dead code.

Restore both methods:
- UnmarshalJSONPB: gogo's jsonpb dispatches to this on nested messages.
  Pre-processes int64/uint64 quoted strings, then delegates to encoding/json
  (which handles metav1.Time.UnmarshalJSON correctly for RFC3339).
- MarshalJSON: uses encoding/json so json:",inline" on TypeMeta produces
  "kind"/"apiVersion" at the root level, which Kubernetes requires.

Also correct the test doc comment, which described UnmarshalJSON (an earlier
approach) rather than UnmarshalJSONPB (what actually landed).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🔍 Go Lint & TODO Tracking Results

⚠️ golangci-lint: Issues detected

level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"
Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)

✅ TODO Tracking: All TODOs properly linked to issues

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Go Coverage Report (Bazel)

Total Coverage: 63.9%

Coverage Policy:

  • Baseline (existing code): ≥60% (current coverage)
  • New/changed code: ≥90% ✅ STRICTLY ENFORCED
  • Long-term goal: Improve baseline to 90%

View detailed HTML report in artifacts

Use strconv.ParseInt/ParseUint instead of a manual digit loop.
Rename CollectInt64Fields param to structType (it's the root struct
to walk, not an int64 type). Drop the redundant type annotation on
the sync.Map var. Trim comments that referenced the client transport
or narrated no-op behavior.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🔍 Go Lint & TODO Tracking Results

⚠️ golangci-lint: Issues detected

level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"
Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)

✅ TODO Tracking: All TODOs properly linked to issues

Without a custom MarshalJSON, encoding/json handles TriggerRun directly
and correctly inlines TypeMeta (kind/apiVersion at root) for Kubernetes
writes via json:",inline". gogo's jsonpb marshals GET/LIST responses
natively, producing the nested typeMeta form the proto3 client expects.

The encoding/json-based MarshalJSON was added to fix a Kubernetes write
failure, but that failure was only introduced by the preceding jsonpb-
based MarshalJSON. The correct fix is no custom MarshalJSON at all.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🔍 Go Lint & TODO Tracking Results

⚠️ golangci-lint: Issues detected

level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"
Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)

✅ TODO Tracking: All TODOs properly linked to issues

The generic walker with a match predicate was designed to be shared
between CollectMetav1TimeFields and CollectInt64Fields, but
CollectMetav1TimeFields was removed. With a single caller the
abstraction adds more reading cost than it saves — replace it with a
specific private function.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🔍 Go Lint & TODO Tracking Results

⚠️ golangci-lint: Issues detected

level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"
Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)

✅ TODO Tracking: All TODOs properly linked to issues

prefix and visited are recursion-internal details that the caller
had no business setting. Replace the separate helper with a closure
that captures both, leaving only the recursive arguments (typ, prefix)
visible inside the walk.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🔍 Go Lint & TODO Tracking Results

⚠️ golangci-lint: Issues detected

level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"
Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)

✅ TODO Tracking: All TODOs properly linked to issues

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

🔍 Go Lint & TODO Tracking Results

⚠️ golangci-lint: Issues detected

level=warning msg="[config_reader] The configuration option `run.skip-dirs` is deprecated, please use `issues.exclude-dirs`."
level=warning msg="[config_reader] The configuration option `output.format` is deprecated, please use `output.formats`"
Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.3)

✅ TODO Tracking: All TODOs properly linked to issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant