DOKUMEN123.COM

TensorRT
TensorRT
Developer	Nvidia
Initial release	2017
Written in	C++, Python, CUDA
Operating system	Linux, Windows
Platform	Nvidia GPUs
Type	Software development kit, Inference engine
License	Proprietary software; some companion components are open-source under the Apache License 2.0
Website	developer.nvidia.com/tensorrt
Repository	github.com/NVIDIA/TensorRT; github.com/NVIDIA/TensorRT-LLM

TensorRT is a software development kit (SDK) and inference optimization runtime developed by Nvidia for deploying trained deep learning and machine learning models on graphics processing units (GPUs).^[1]^[2] It can import models from frameworks such as PyTorch, TensorFlow, and ONNX, and compile them into optimized runtime engines for low-latency and high-throughput inference.^[1]^[2]

In current Nvidia documentation, the TensorRT name is also used for a broader product family that includes the core TensorRT SDK, TensorRT-LLM, and TensorRT-RTX.^[3] The core SDK is primarily a proprietary Nvidia product, although Nvidia also maintains Apache-licensed open-source TensorRT repositories and related companion projects.^[4]^[5]

History

TensorRT was available as part of Nvidia's deep learning software stack by 2017, when it was described as a high-performance inference engine for deploying trained neural networks on Nvidia GPUs.^[6] In 2018, Google announced integration of Nvidia TensorRT with TensorFlow 1.7, describing TensorRT as a library that optimizes deep learning models for inference and creates a runtime for deployment on GPUs in production environments.^[7]

Overview

The core of TensorRT is a C++ library that takes a trained network, consisting of a network definition and trained parameters, and produces a highly optimized runtime engine for inference on Nvidia GPUs.^[2] TensorRT provides both C++ and Python APIs, and models can either be expressed directly through its network definition API or imported through its ONNX parser.^[2]

According to Nvidia's documentation, TensorRT performs graph-level and kernel-level optimizations such as layer fusion and selection of efficient implementations for supported operations.^[2] Current documentation also describes support for dynamic shapes, mixed-precision execution modes including FP32, FP16, BF16, FP8, and INT8, and specialized optimizations for transformer and large language model workloads.^[1]

TensorRT engines can be generated through the TensorRT APIs or with the trtexec command-line utility.^[8] Nvidia's quick-start documentation describes deployment workflows based on ONNX conversion, runtime APIs, and direct engine deserialization for C++ and Python applications.^[8]

Licensing and open-source components

The licensing model around TensorRT is split between a proprietary core SDK and a set of open-source repositories and tools.^[4]^[5] The packaged TensorRT software distributed by Nvidia is governed by the Nvidia Software License Agreement.^[4] At the same time, Nvidia maintains a public TensorRT repository on GitHub under the Apache License 2.0.^[5]

Official TensorRT documentation also directs users to the TensorRT open-source software repository for quick-start code and samples.^[8] The architecture documentation describes related tooling such as Polygraphy for debugging and constant folding, as well as ONNX-GraphSurgeon for modifying ONNX graphs before deployment with TensorRT.^[9] TensorRT also supports a plugin mechanism for custom layers and unsupported operations.^[8]

Product family

Nvidia's current documentation groups several inference products under the TensorRT name.^[3] In that documentation, the core SDK is distinguished as TensorRT (Enterprise), while related offerings include TensorRT-LLM for large language model inference and TensorRT-RTX for consumer RTX GPUs.^[3]

TensorRT-LLM

TensorRT-LLM is a related open-source toolkit for optimizing and serving large language models on Nvidia GPUs.^[3]^[10] Nvidia describes it as providing a Python API to define LLMs and build TensorRT engines optimized for LLM workloads.^[3]^[10]

According to Nvidia's product-family documentation, TensorRT-LLM supports multi-GPU and multi-node execution, in-flight batching, paged KV cacheing, and quantization methods such as FP8, INT8, and INT4 for higher-throughput model serving.^[3] The TensorRT-LLM codebase is published on GitHub under the Apache License 2.0.^[11]

Because Nvidia documents TensorRT-LLM as a separate member of the TensorRT product family, it is typically treated as a related but distinct software project rather than as a single feature of the base TensorRT SDK.^[3]

External links

References

^ ^a ^b ^c "NVIDIA TensorRT Documentation". NVIDIA Docs. Retrieved April 23, 2026.
^ ^a ^b ^c ^d ^e "Overview". NVIDIA Docs. Retrieved April 23, 2026.
^ ^a ^b ^c ^d ^e ^f ^g "NVIDIA TensorRT Product Family". NVIDIA Docs. Retrieved April 23, 2026.
^ ^a ^b ^c "TensorRT/python/packaging/frontend_sdist/LICENSE.txt at main". GitHub. Retrieved April 23, 2026.
^ ^a ^b ^c "TensorRT/LICENSE at main". GitHub. Retrieved April 23, 2026.
^ "NVIDIA in HPC and AI" (PDF). Ohio Supercomputer Center. Retrieved April 23, 2026.
^ "Announcing TensorRT integration with TensorFlow 1.7". Google Developers Blog. March 27, 2018. Retrieved April 23, 2026.
^ ^a ^b ^c ^d "Quick Start Guide". NVIDIA Docs. Retrieved April 23, 2026.
^ "Architecture Overview". NVIDIA Docs. Retrieved April 23, 2026.
^ ^a ^b "NVIDIA/TensorRT-LLM". GitHub. Retrieved April 23, 2026.
^ "TensorRT-LLM/LICENSE at main". GitHub. Retrieved April 23, 2026.

[trt-docs-1] "NVIDIA TensorRT Documentation". NVIDIA Docs. Retrieved April 23, 2026.

[trt-overview-2] "Overview". NVIDIA Docs. Retrieved April 23, 2026.

[trt-family-3] ^ ^a ^b ^c ^d ^e ^f ^g "NVIDIA TensorRT Product Family". NVIDIA Docs. Retrieved April 23, 2026.

[trt-sla-4] "TensorRT/python/packaging/frontend_sdist/LICENSE.txt at main". GitHub. Retrieved April 23, 2026.

[trt-oss-license-5] "TensorRT/LICENSE at main". GitHub. Retrieved April 23, 2026.

[osc-2017-6] "NVIDIA in HPC and AI" (PDF). Ohio Supercomputer Center. Retrieved April 23, 2026.

[tf-trt-2018-7] "Announcing TensorRT integration with TensorFlow 1.7". Google Developers Blog. March 27, 2018. Retrieved April 23, 2026.

[trt-quick-8] "Quick Start Guide". NVIDIA Docs. Retrieved April 23, 2026.

[trt-arch-9] "Architecture Overview". NVIDIA Docs. Retrieved April 23, 2026.

[trtllm-gh-10] "NVIDIA/TensorRT-LLM". GitHub. Retrieved April 23, 2026.

[trtllm-license-11] "TensorRT-LLM/LICENSE at main". GitHub. Retrieved April 23, 2026.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]