Skip to content

[Relax][TensorRT] Update TensorRT runtime to 10#19789

Merged
tqchen merged 1 commit into
apache:mainfrom
tlopex:fix-tensorrt10-byoc-19609
Jun 16, 2026
Merged

[Relax][TensorRT] Update TensorRT runtime to 10#19789
tqchen merged 1 commit into
apache:mainfrom
tlopex:fix-tensorrt10-byoc-19609

Conversation

@tlopex

@tlopex tlopex commented Jun 16, 2026

Copy link
Copy Markdown
Member

This pr fixes #19609. TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC integration relied on, so it failed to compile against TRT >= 10. Port the runtime and codegen to the TRT10 API and require TensorRT >= 10:

  • Lifetime: obj->destroy() -> delete (destroy() removed in TRT10).
  • Builder: drop implicit-batch mode (networks are always explicit-batch via createNetworkV2(0); setMaxBatchSize removed); setMaxWorkspaceSize -> setMemoryPoolLimit(kWORKSPACE); buildEngineWithConfig -> buildSerializedNetwork + deserializeCudaEngine, keeping the IRuntime alive alongside the engine.
  • Execution: the binding-index model (getNbBindings / getBindingIndex / setBindingDimensions / execute / executeV2) -> the named-tensor model (getNbIOTensors / setInputShape / setTensorAddress / enqueueV3); deserializeCudaEngine drops the trailing IPluginFactory* argument.
  • Layers: addConvolution / addPooling / addDeconvolution / addPadding ->
    the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer / addFullyConnected removed -> dense rebuilt with addConstant + addMatrixMultiply.
  • Add a build-time guard that emits a clear error on TensorRT < 10.

Also fix pre-existing issues that prevented this path from running end-to-end: the runtime had drifted from the current tvm-ffi API (TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array, a stale override on the destructor), and the conv converters read a Relay-era "channels" attribute that Relax does not emit (output channels are now derived from the kernel shape).

All tests are verified correct locally. This pr barely includes api updates and there is no new parts added

TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC
integration relied on, so it failed to compile against TRT >= 10
(apache#19609). Port the runtime and codegen to the TRT10 API and
require TensorRT >= 10:

- Lifetime: obj->destroy() -> delete (destroy() removed in TRT10).
- Builder: drop implicit-batch mode (networks are always explicit-batch
  via createNetworkV2(0); setMaxBatchSize removed); setMaxWorkspaceSize
  -> setMemoryPoolLimit(kWORKSPACE); buildEngineWithConfig ->
  buildSerializedNetwork + deserializeCudaEngine, keeping the IRuntime
  alive alongside the engine.
- Execution: the binding-index model (getNbBindings / getBindingIndex /
  setBindingDimensions / execute / executeV2) -> the named-tensor model
  (getNbIOTensors / setInputShape / setTensorAddress / enqueueV3);
  deserializeCudaEngine drops the trailing IPluginFactory* argument.
- Layers: addConvolution / addPooling / addDeconvolution / addPadding ->
  the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer /
  addFullyConnected removed -> dense rebuilt with addConstant +
  addMatrixMultiply.
- Add a build-time guard that emits a clear error on TensorRT < 10.

Also fix pre-existing issues that prevented this path from running
end-to-end: the runtime had drifted from the current tvm-ffi API
(TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array,
a stale `override` on the destructor), and the conv converters read a
Relay-era "channels" attribute that Relax does not emit (output channels
are now derived from the kernel shape).

Correctness fixes from an old-vs-new parity review, plus tests:

- Conv1D assumed an implicit batch dimension and dropped the spatial
  dimension under explicit batch; the reshape now derives from the full
  input rank.
- INT8 calibration: the per-input element count no longer includes the
  batch dimension (the calibrator multiplies by batch size itself), which
  previously over-read the input, and the calibrator's device buffers are
  now sized for a full batch instead of a single sample, which previously
  over-wrote memory. Both crashed INT8 calibration for batch > 1.
- Single-engine reuse now requires an exact batch match, since an
  explicit-batch engine's optimization profile pins the built batch size.
- TRT_HAS_IMPLICIT_BATCH is unconditionally false and no longer calls the
  deprecated hasImplicitBatchDimension().
- Run on TVM's current CUDA stream instead of the default stream.
- Warn instead of silently ignoring use_implicit_batch=True, and default
  it to False in the codegen config.
- Null-check the engine build/deserialize paths and free the runtime on
  failure.
- conv2d_transpose / conv3d_transpose now use the IOHW / IODHW kernel
  layout (Relax's default, which also matches TensorRT's deconvolution
  weight layout) instead of the Relay-era OIHW assumption, so the weight
  is passed through directly and the output channel count comes from the
  second kernel dimension.
- Remove dead pre-5.1.5 padding blocks and unused builder members.
- Add offload tests for conv1d, max_pool2d, avg_pool2d, softmax, sigmoid,
  tanh, conv2d_transpose, conv3d_transpose, and INT8 calibration.

Verified: builds against TensorRT 10.16 with CUDA 12.8, and the added
tests pass on both an RTX 2070 (Turing) and an RTX 5090 (Blackwell).

Fixes apache#19609

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the TVM TensorRT integration to target the TensorRT 10 API, which removes implicit-batch mode, binding indices, and several deprecated layer creation APIs (such as addFullyConnected). The changes transition the codebase to explicit-batch mode, update the operator converters to use the new Nd layer APIs, and manage the deserialization runtime lifetime alongside the engine. The review feedback highlights several critical safety improvements, specifically recommending null-pointer checks for createInferRuntime, addConstant, and addShuffle calls, guarding against integer overflow when handling dynamic dimensions during INT8 calibration, and preventing out-of-bounds access when resolving the device ID from input_var_eid_.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/runtime/extra/contrib/tensorrt/tensorrt_builder.cc
Comment thread src/runtime/extra/contrib/tensorrt/tensorrt_runtime.cc
Comment thread src/runtime/extra/contrib/tensorrt/tensorrt_ops.cc
Comment thread src/runtime/extra/contrib/tensorrt/tensorrt_ops.cc
Comment thread src/runtime/extra/contrib/tensorrt/tensorrt_runtime.cc
Comment thread src/runtime/extra/contrib/tensorrt/tensorrt_runtime.cc
@tqchen tqchen merged commit d591cd4 into apache:main Jun 16, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] TensorRT 10 compatibility issues in Relax TensorRT BYOC (TVM 0.24, CUDA 12.4+)

2 participants