What happened:
conformance/utils/grpc.DefaultClient (the default Options.GRPCClient) has two robustness bugs, both reproduced locally:
- Not concurrency-safe.
ensureConnection/resetConnection read and write c.Conn with no lock, and SendRPC reads it (pb.NewGrpcEchoClient(c.Conn)). go test -race on N concurrent SendRPC calls against one shared client reports a data race:
WARNING: DATA RACE
Write at ... ensureConnection() grpc.go:164 (c.Conn, err = grpc.NewClient(...))
Previous read at ... grpc.go:155 (if c.Conn != nil)
Previous read at ... grpc.go:201 (pb.NewGrpcEchoClient(c.Conn))
- Not reusable after
Close(). Close() shuts the *grpc.ClientConn but does not nil c.Conn, so a later SendRPC short-circuits in ensureConnection and dials the closed connection. Reuse-after-Close() returns codes.Canceled; a fresh client returns OK against the same server.
What you expected to happen:
The documented default client should be safe to use concurrently, and either reusable after Close() or clearly single-use.
How to reproduce it (as minimally and precisely as possible):
(1) go test -race with N goroutines calling SendRPC on one DefaultClient. (2) SendRPC → Close() → SendRPC against an in-process echo server: the second call returns Canceled while a fresh client returns OK. I have both as Go tests, ready to include.
Anything else we need to know?:
Both are latent today (the suite constructs a fresh DefaultClient per request and rarely reuses one after Close()), but:
Suggested fix: guard c.Conn with a sync.Mutex (concurrency); set c.Conn = nil in Close() (reuse). I have a fix plus the two regression tests ready — would you prefer them as one PR or two separate PRs (one per bug)? Happy either way.
/kind bug
/area conformance-machinery
What happened:
conformance/utils/grpc.DefaultClient(the defaultOptions.GRPCClient) has two robustness bugs, both reproduced locally:ensureConnection/resetConnectionread and writec.Connwith no lock, andSendRPCreads it (pb.NewGrpcEchoClient(c.Conn)).go test -raceon N concurrentSendRPCcalls against one shared client reports a data race:Close().Close()shuts the*grpc.ClientConnbut does not nilc.Conn, so a laterSendRPCshort-circuits inensureConnectionand dials the closed connection. Reuse-after-Close()returnscodes.Canceled; a fresh client returnsOKagainst the same server.What you expected to happen:
The documented default client should be safe to use concurrently, and either reusable after
Close()or clearly single-use.How to reproduce it (as minimally and precisely as possible):
(1)
go test -racewith N goroutines callingSendRPCon oneDefaultClient. (2)SendRPC→Close()→SendRPCagainst an in-process echo server: the second call returnsCanceledwhile a fresh client returnsOK. I have both as Go tests, ready to include.Anything else we need to know?:
Both are latent today (the suite constructs a fresh
DefaultClientper request and rarely reuses one afterClose()), but:Close()bug is the root of the "helper closes a caller-supplied client it then reuses" class handled at the call site in Route the GRPCRouteWeight sampler through the injectable GRPCClient #4937 / conformance: MakeRequestAndExpectEventuallyConsistentResponse closes a caller-supplied gRPC client it does not own #4938 — nil-ingc.ConninClose()removes it at the source;DefaultClientcannot currently back theOptions.GRPCClientinjection point the wayRoundTripperis defaulted in the suite constructor.Suggested fix: guard
c.Connwith async.Mutex(concurrency); setc.Conn = nilinClose()(reuse). I have a fix plus the two regression tests ready — would you prefer them as one PR or two separate PRs (one per bug)? Happy either way./kind bug
/area conformance-machinery