[Relax][TensorRT] Update TensorRT runtime to 10#19789
Conversation
TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC integration relied on, so it failed to compile against TRT >= 10 (apache#19609). Port the runtime and codegen to the TRT10 API and require TensorRT >= 10: - Lifetime: obj->destroy() -> delete (destroy() removed in TRT10). - Builder: drop implicit-batch mode (networks are always explicit-batch via createNetworkV2(0); setMaxBatchSize removed); setMaxWorkspaceSize -> setMemoryPoolLimit(kWORKSPACE); buildEngineWithConfig -> buildSerializedNetwork + deserializeCudaEngine, keeping the IRuntime alive alongside the engine. - Execution: the binding-index model (getNbBindings / getBindingIndex / setBindingDimensions / execute / executeV2) -> the named-tensor model (getNbIOTensors / setInputShape / setTensorAddress / enqueueV3); deserializeCudaEngine drops the trailing IPluginFactory* argument. - Layers: addConvolution / addPooling / addDeconvolution / addPadding -> the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer / addFullyConnected removed -> dense rebuilt with addConstant + addMatrixMultiply. - Add a build-time guard that emits a clear error on TensorRT < 10. Also fix pre-existing issues that prevented this path from running end-to-end: the runtime had drifted from the current tvm-ffi API (TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array, a stale `override` on the destructor), and the conv converters read a Relay-era "channels" attribute that Relax does not emit (output channels are now derived from the kernel shape). Correctness fixes from an old-vs-new parity review, plus tests: - Conv1D assumed an implicit batch dimension and dropped the spatial dimension under explicit batch; the reshape now derives from the full input rank. - INT8 calibration: the per-input element count no longer includes the batch dimension (the calibrator multiplies by batch size itself), which previously over-read the input, and the calibrator's device buffers are now sized for a full batch instead of a single sample, which previously over-wrote memory. Both crashed INT8 calibration for batch > 1. - Single-engine reuse now requires an exact batch match, since an explicit-batch engine's optimization profile pins the built batch size. - TRT_HAS_IMPLICIT_BATCH is unconditionally false and no longer calls the deprecated hasImplicitBatchDimension(). - Run on TVM's current CUDA stream instead of the default stream. - Warn instead of silently ignoring use_implicit_batch=True, and default it to False in the codegen config. - Null-check the engine build/deserialize paths and free the runtime on failure. - conv2d_transpose / conv3d_transpose now use the IOHW / IODHW kernel layout (Relax's default, which also matches TensorRT's deconvolution weight layout) instead of the Relay-era OIHW assumption, so the weight is passed through directly and the output channel count comes from the second kernel dimension. - Remove dead pre-5.1.5 padding blocks and unused builder members. - Add offload tests for conv1d, max_pool2d, avg_pool2d, softmax, sigmoid, tanh, conv2d_transpose, conv3d_transpose, and INT8 calibration. Verified: builds against TensorRT 10.16 with CUDA 12.8, and the added tests pass on both an RTX 2070 (Turing) and an RTX 5090 (Blackwell). Fixes apache#19609
There was a problem hiding this comment.
Code Review
This pull request updates the TVM TensorRT integration to target the TensorRT 10 API, which removes implicit-batch mode, binding indices, and several deprecated layer creation APIs (such as addFullyConnected). The changes transition the codebase to explicit-batch mode, update the operator converters to use the new Nd layer APIs, and manage the deserialization runtime lifetime alongside the engine. The review feedback highlights several critical safety improvements, specifically recommending null-pointer checks for createInferRuntime, addConstant, and addShuffle calls, guarding against integer overflow when handling dynamic dimensions during INT8 calibration, and preventing out-of-bounds access when resolving the device ID from input_var_eid_.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
This pr fixes #19609. TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC integration relied on, so it failed to compile against TRT >= 10. Port the runtime and codegen to the TRT10 API and require TensorRT >= 10:
the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer / addFullyConnected removed -> dense rebuilt with addConstant + addMatrixMultiply.
Also fix pre-existing issues that prevented this path from running end-to-end: the runtime had drifted from the current tvm-ffi API (TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array, a stale
overrideon the destructor), and the conv converters read a Relay-era "channels" attribute that Relax does not emit (output channels are now derived from the kernel shape).All tests are verified correct locally. This pr barely includes api updates and there is no new parts added