You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/developer-guide/perf-benchmarking.md
+10-12Lines changed: 10 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,14 +8,14 @@ Expect breaking API changes.
8
8
```
9
9
10
10
TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it
11
-
easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
11
+
easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
12
12
13
13
- A streamlined way to build tuned engines for benchmarking for a variety of models and platforms.
14
14
- An entirely Python workflow for benchmarking.
15
15
- Ability to benchmark various flows and features within TensorRT LLM.
16
16
17
17
`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see
18
-
the [in-flight batching section](../advanced/gpt-attention.md#in-flight-batching) that describes the concept
18
+
the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
To benchmark the PyTorch backend (`tensorrt_llm._torch`), use the following command with [dataset](#preparing-a-dataset) generated from previous steps. The `throughput` benchmark initializes the backend by tuning against the
159
-
dataset provided via `--dataset` (or the other build mode settings described [above](#other-build-modes)).
160
-
Note that CUDA graph is enabled by default. You can add additional pytorch config with
161
-
`--extra_llm_api_options` followed by the path to a YAML file. For more details, please refer to the
162
-
help text by running the command with `--help`.
158
+
To benchmark the PyTorch backend (`tensorrt_llm._torch`), use the following command with [dataset](#preparing-a-dataset) generated from previous steps. The `throughput` benchmark initializes the backend by tuning against the dataset provided via `--dataset` (or the other build mode settings described above).
159
+
160
+
Note that CUDA graph is enabled by default. You can add additional pytorch config with `--extra_llm_api_options` followed by the path to a YAML file. For more details, please refer to the help text by running the command with `--help`.
163
161
164
162
```{tip}
165
163
The command below specifies the `--model_path` option. The model path is optional and used only when you want to run a locally
@@ -310,7 +308,7 @@ Each subdirectory should contain the LoRA adapter files for that specific task.
310
308
To benchmark multi-modal models with PyTorch workflow, you can follow the similar approach as above.
311
309
312
310
First, prepare the dataset:
313
-
```
311
+
```python
314
312
python ./benchmarks/cpp/prepare_dataset.py \
315
313
--tokenizer Qwen/Qwen2-VL-2B-Instruct \
316
314
--stdout \
@@ -334,7 +332,7 @@ Sample dataset for multimodal:
0 commit comments