TensorRT
| TensorRT | |
|---|---|
| Developer | Nvidia |
| Initial release | 2017 |
| Written in | C++, Python, CUDA |
| Operating system | Linux, Windows |
| Platform | Nvidia GPUs |
| Type | Software development kit, Inference engine |
| License | Proprietary software; some companion components are open-source under the Apache License 2.0 |
| Website | developer |
| Repository | github github |
TensorRT is a software development kit (SDK) and inference optimization runtime developed by Nvidia for deploying trained deep learning and machine learning models on graphics processing units (GPUs).[1][2] It can import models from frameworks such as PyTorch, TensorFlow, and ONNX, and compile them into optimized runtime engines for low-latency and high-throughput inference.[1][2]
In current Nvidia documentation, the TensorRT name is also used for a broader product family that includes the core TensorRT SDK, TensorRT-LLM, and TensorRT-RTX.[3] The core SDK is primarily a proprietary Nvidia product, although Nvidia also maintains Apache-licensed open-source TensorRT repositories and related companion projects.[4][5]
History
TensorRT was available as part of Nvidia's deep learning software stack by 2017, when it was described as a high-performance inference engine for deploying trained neural networks on Nvidia GPUs.[6] In 2018, Google announced integration of Nvidia TensorRT with TensorFlow 1.7, describing TensorRT as a library that optimizes deep learning models for inference and creates a runtime for deployment on GPUs in production environments.[7]
Overview
The core of TensorRT is a C++ library that takes a trained network, consisting of a network definition and trained parameters, and produces a highly optimized runtime engine for inference on Nvidia GPUs.[2] TensorRT provides both C++ and Python APIs, and models can either be expressed directly through its network definition API or imported through its ONNX parser.[2]
According to Nvidia's documentation, TensorRT performs graph-level and kernel-level optimizations such as layer fusion and selection of efficient implementations for supported operations.[2] Current documentation also describes support for dynamic shapes, mixed-precision execution modes including FP32, FP16, BF16, FP8, and INT8, and specialized optimizations for transformer and large language model workloads.[1]
TensorRT engines can be generated through the TensorRT APIs or with the trtexec command-line utility.[8] Nvidia's quick-start documentation describes deployment workflows based on ONNX conversion, runtime APIs, and direct engine deserialization for C++ and Python applications.[8]
Licensing and open-source components
The licensing model around TensorRT is split between a proprietary core SDK and a set of open-source repositories and tools.[4][5] The packaged TensorRT software distributed by Nvidia is governed by the Nvidia Software License Agreement.[4] At the same time, Nvidia maintains a public TensorRT repository on GitHub under the Apache License 2.0.[5]
Official TensorRT documentation also directs users to the TensorRT open-source software repository for quick-start code and samples.[8] The architecture documentation describes related tooling such as Polygraphy for debugging and constant folding, as well as ONNX-GraphSurgeon for modifying ONNX graphs before deployment with TensorRT.[9] TensorRT also supports a plugin mechanism for custom layers and unsupported operations.[8]
Product family
Nvidia's current documentation groups several inference products under the TensorRT name.[3] In that documentation, the core SDK is distinguished as TensorRT (Enterprise), while related offerings include TensorRT-LLM for large language model inference and TensorRT-RTX for consumer RTX GPUs.[3]
TensorRT-LLM
TensorRT-LLM is a related open-source toolkit for optimizing and serving large language models on Nvidia GPUs.[3][10] Nvidia describes it as providing a Python API to define LLMs and build TensorRT engines optimized for LLM workloads.[3][10]
According to Nvidia's product-family documentation, TensorRT-LLM supports multi-GPU and multi-node execution, in-flight batching, paged KV cacheing, and quantization methods such as FP8, INT8, and INT4 for higher-throughput model serving.[3] The TensorRT-LLM codebase is published on GitHub under the Apache License 2.0.[11]
Because Nvidia documents TensorRT-LLM as a separate member of the TensorRT product family, it is typically treated as a related but distinct software project rather than as a single feature of the base TensorRT SDK.[3]
See also
- llama.cpp
- SGLang
- vLLM
- Lists of open-source artificial intelligence software
- Comparison of deep learning software
- Comparison of machine learning software
External links
References
- ^ a b c "NVIDIA TensorRT Documentation". NVIDIA Docs. Retrieved April 23, 2026.
- ^ a b c d e "Overview". NVIDIA Docs. Retrieved April 23, 2026.
- ^ a b c d e f g "NVIDIA TensorRT Product Family". NVIDIA Docs. Retrieved April 23, 2026.
- ^ a b c "TensorRT/python/packaging/frontend_sdist/LICENSE.txt at main". GitHub. Retrieved April 23, 2026.
- ^ a b c "TensorRT/LICENSE at main". GitHub. Retrieved April 23, 2026.
- ^ "NVIDIA in HPC and AI" (PDF). Ohio Supercomputer Center. Retrieved April 23, 2026.
- ^ "Announcing TensorRT integration with TensorFlow 1.7". Google Developers Blog. March 27, 2018. Retrieved April 23, 2026.
- ^ a b c d "Quick Start Guide". NVIDIA Docs. Retrieved April 23, 2026.
- ^ "Architecture Overview". NVIDIA Docs. Retrieved April 23, 2026.
- ^ a b "NVIDIA/TensorRT-LLM". GitHub. Retrieved April 23, 2026.
- ^ "TensorRT-LLM/LICENSE at main". GitHub. Retrieved April 23, 2026.
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.