Skip to content

Client INVITE 2xx auto-ACK opens a new connection to the Record-Route/Contact target instead of reusing the existing flow (connection-oriented transports / RFC 7118) → multi-second stall #129

Description

@yashsntl

Summary

With a UAC running over a single outbound connection-oriented transport (WS/WSS/TCP/TLS), receiving a 200 OK to an INVITE blocks Transaction::receive() for the full TCP connect timeout (~21s on Windows) when the dialog's Record-Route/Contact target is not directly reachable from the client.

The root cause: on a 2xx the client INVITE transaction auto-sends ACK, and that ACK resolves its target via TransportLayer::lookup(), which opens a brand-new connection to the ACK target instead of reusing the connection the 2xx arrived on. When that target is an address the client cannot reach (very common behind any record-routing proxy/B2BUA that advertises an internal address), the connect blocks for the OS connect timeout, and the 200 OK is only surfaced to the TU after send_ack returns.

Environment

  • rsipstack 0.5.15
  • Transport::Wss, a single outbound WebSocket connection to a record-routing proxy
  • Client consumes responses via a while let Some(msg) = transaction.receive().await loop

Steps to reproduce

  1. UAC establishes one outbound WS/WSS connection to a proxy and REGISTERs.
  2. UAC sends an INVITE and receives 100/183 promptly over that same connection.
  3. The 200 OK carries Record-Route/Contact pointing at an address the client cannot reach directly (e.g. the server's internal address — 192.0.2.10:5060;transport=ws as a placeholder).
  4. transaction.receive() does not return the 200 OK for ~21 seconds.

Observed diagnostic log:

rsipstack::transport::transaction: transition key=c.INVITE_... from=Proceeding to=Completed
rsipstack::transport::transport_layer: lookup target key=Some(...) src=WS 192.0.2.10:5060 target=WS 192.0.2.10:5060
   <~21s gap, no frames received>
   (200 OK finally delivered to the application)

The 200 OK frame itself arrives on the existing connection essentially immediately; the delay is entirely the new-connection attempt to the unreachable target. (Verified independently: a raw TCP connect to that target address fails after ~21.0s — the Windows SYN-retransmit connect timeout — matching the stall duration exactly.)

Root cause (code references, v0.5.15)

  • src/transaction/transaction.rson_received_response: on a 2xx, the ClientInvite transaction transitions to Completed and calls send_ack(connection).
  • src/transaction/transaction.rssend_ack (~448–520): for a 2xx it derives the ACK target from the response/Request-URI and calls transport_layer.lookup(target, ...) rather than reusing the connection the 2xx arrived on (which is passed in).
  • src/transport/transport_layer.rslookup (~266–332): if no entry in connections: HashMap<SipAddr, SipConnection> matches target, it opens a NEW connection via WebSocketConnection::connect / TcpConnection::connect / TlsConnection::connect. For an unreachable target this blocks for the OS connect timeout.

Provisional responses (183) and non-INVITE transactions (e.g. the BYE 200 OK) are delivered in well under a millisecond, because they do not trigger this target lookup/connect — only the 2xx INVITE auto-ACK does.

Expected behavior

Per RFC 7118 §5 (SIP over WebSocket) and general connection-oriented transport handling, in-dialog requests and the 2xx ACK must reuse the existing flow the dialog was established on. The client should never dial the Record-Route/Contact address on a connection-oriented transport.

Suggested fix

In send_ack, for a 2xx ACK prefer the connection the response arrived on (already passed in as connection) before falling back to lookup; and/or have lookup reuse the established connection for connection-oriented transports when one exists for the flow.

Workaround

Build the TransportLayer manually, set TransportLayer.outbound to the proxy/flow address, and pre-register the connection with add_connection so lookup resolves to the existing connection (since lookup uses outbound.unwrap_or(destination)). This avoids the new-connection attempt, but the underlying send_ack/lookup behavior still seems worth fixing for connection-oriented transports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions