Summary
With a UAC running over a single outbound connection-oriented transport (WS/WSS/TCP/TLS), receiving a 200 OK to an INVITE blocks Transaction::receive() for the full TCP connect timeout (~21s on Windows) when the dialog's Record-Route/Contact target is not directly reachable from the client.
The root cause: on a 2xx the client INVITE transaction auto-sends ACK, and that ACK resolves its target via TransportLayer::lookup(), which opens a brand-new connection to the ACK target instead of reusing the connection the 2xx arrived on. When that target is an address the client cannot reach (very common behind any record-routing proxy/B2BUA that advertises an internal address), the connect blocks for the OS connect timeout, and the 200 OK is only surfaced to the TU after send_ack returns.
Environment
- rsipstack 0.5.15
Transport::Wss, a single outbound WebSocket connection to a record-routing proxy
- Client consumes responses via a
while let Some(msg) = transaction.receive().await loop
Steps to reproduce
- UAC establishes one outbound WS/WSS connection to a proxy and REGISTERs.
- UAC sends an INVITE and receives
100/183 promptly over that same connection.
- The
200 OK carries Record-Route/Contact pointing at an address the client cannot reach directly (e.g. the server's internal address — 192.0.2.10:5060;transport=ws as a placeholder).
transaction.receive() does not return the 200 OK for ~21 seconds.
Observed diagnostic log:
rsipstack::transport::transaction: transition key=c.INVITE_... from=Proceeding to=Completed
rsipstack::transport::transport_layer: lookup target key=Some(...) src=WS 192.0.2.10:5060 target=WS 192.0.2.10:5060
<~21s gap, no frames received>
(200 OK finally delivered to the application)
The 200 OK frame itself arrives on the existing connection essentially immediately; the delay is entirely the new-connection attempt to the unreachable target. (Verified independently: a raw TCP connect to that target address fails after ~21.0s — the Windows SYN-retransmit connect timeout — matching the stall duration exactly.)
Root cause (code references, v0.5.15)
src/transaction/transaction.rs — on_received_response: on a 2xx, the ClientInvite transaction transitions to Completed and calls send_ack(connection).
src/transaction/transaction.rs — send_ack (~448–520): for a 2xx it derives the ACK target from the response/Request-URI and calls transport_layer.lookup(target, ...) rather than reusing the connection the 2xx arrived on (which is passed in).
src/transport/transport_layer.rs — lookup (~266–332): if no entry in connections: HashMap<SipAddr, SipConnection> matches target, it opens a NEW connection via WebSocketConnection::connect / TcpConnection::connect / TlsConnection::connect. For an unreachable target this blocks for the OS connect timeout.
Provisional responses (183) and non-INVITE transactions (e.g. the BYE 200 OK) are delivered in well under a millisecond, because they do not trigger this target lookup/connect — only the 2xx INVITE auto-ACK does.
Expected behavior
Per RFC 7118 §5 (SIP over WebSocket) and general connection-oriented transport handling, in-dialog requests and the 2xx ACK must reuse the existing flow the dialog was established on. The client should never dial the Record-Route/Contact address on a connection-oriented transport.
Suggested fix
In send_ack, for a 2xx ACK prefer the connection the response arrived on (already passed in as connection) before falling back to lookup; and/or have lookup reuse the established connection for connection-oriented transports when one exists for the flow.
Workaround
Build the TransportLayer manually, set TransportLayer.outbound to the proxy/flow address, and pre-register the connection with add_connection so lookup resolves to the existing connection (since lookup uses outbound.unwrap_or(destination)). This avoids the new-connection attempt, but the underlying send_ack/lookup behavior still seems worth fixing for connection-oriented transports.
Summary
With a UAC running over a single outbound connection-oriented transport (WS/WSS/TCP/TLS), receiving a
200 OKto an INVITE blocksTransaction::receive()for the full TCP connect timeout (~21s on Windows) when the dialog's Record-Route/Contact target is not directly reachable from the client.The root cause: on a 2xx the client INVITE transaction auto-sends ACK, and that ACK resolves its target via
TransportLayer::lookup(), which opens a brand-new connection to the ACK target instead of reusing the connection the 2xx arrived on. When that target is an address the client cannot reach (very common behind any record-routing proxy/B2BUA that advertises an internal address), the connect blocks for the OS connect timeout, and the200 OKis only surfaced to the TU aftersend_ackreturns.Environment
Transport::Wss, a single outbound WebSocket connection to a record-routing proxywhile let Some(msg) = transaction.receive().awaitloopSteps to reproduce
100/183promptly over that same connection.200 OKcarriesRecord-Route/Contactpointing at an address the client cannot reach directly (e.g. the server's internal address —192.0.2.10:5060;transport=wsas a placeholder).transaction.receive()does not return the200 OKfor ~21 seconds.Observed diagnostic log:
The
200 OKframe itself arrives on the existing connection essentially immediately; the delay is entirely the new-connection attempt to the unreachable target. (Verified independently: a raw TCP connect to that target address fails after ~21.0s — the Windows SYN-retransmit connect timeout — matching the stall duration exactly.)Root cause (code references, v0.5.15)
src/transaction/transaction.rs—on_received_response: on a 2xx, theClientInvitetransaction transitions toCompletedand callssend_ack(connection).src/transaction/transaction.rs—send_ack(~448–520): for a 2xx it derives the ACK target from the response/Request-URI and callstransport_layer.lookup(target, ...)rather than reusing theconnectionthe 2xx arrived on (which is passed in).src/transport/transport_layer.rs—lookup(~266–332): if no entry inconnections: HashMap<SipAddr, SipConnection>matchestarget, it opens a NEW connection viaWebSocketConnection::connect/TcpConnection::connect/TlsConnection::connect. For an unreachable target this blocks for the OS connect timeout.Provisional responses (
183) and non-INVITE transactions (e.g. the BYE200 OK) are delivered in well under a millisecond, because they do not trigger this target lookup/connect — only the 2xx INVITE auto-ACK does.Expected behavior
Per RFC 7118 §5 (SIP over WebSocket) and general connection-oriented transport handling, in-dialog requests and the 2xx ACK must reuse the existing flow the dialog was established on. The client should never dial the Record-Route/Contact address on a connection-oriented transport.
Suggested fix
In
send_ack, for a 2xx ACK prefer the connection the response arrived on (already passed in asconnection) before falling back tolookup; and/or havelookupreuse the established connection for connection-oriented transports when one exists for the flow.Workaround
Build the
TransportLayermanually, setTransportLayer.outboundto the proxy/flow address, and pre-register the connection withadd_connectionsolookupresolves to the existing connection (sincelookupusesoutbound.unwrap_or(destination)). This avoids the new-connection attempt, but the underlyingsend_ack/lookupbehavior still seems worth fixing for connection-oriented transports.