Skip to content

Commit ae8270b

Browse files
authored
[None][doc] fix invalid links in perf benchmarking. (#7933)
Signed-off-by: nv-guomingz <[email protected]>
1 parent 57079ce commit ae8270b

File tree

1 file changed

+10
-12
lines changed

1 file changed

+10
-12
lines changed

docs/source/developer-guide/perf-benchmarking.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ Expect breaking API changes.
88
```
99

1010
TensorRT LLM provides the `trtllm-bench` CLI, a packaged benchmarking utility that aims to make it
11-
easier for users to reproduce our officially published [performance overview](./perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
11+
easier for users to reproduce our officially published [performance overview](../performance/perf-overview.md#throughput-measurements). `trtllm-bench` provides the follows:
1212

1313
- A streamlined way to build tuned engines for benchmarking for a variety of models and platforms.
1414
- An entirely Python workflow for benchmarking.
1515
- Ability to benchmark various flows and features within TensorRT LLM.
1616

1717
`trtllm-bench` executes all benchmarks using [in-flight batching] -- for more information see
18-
the [in-flight batching section](../advanced/gpt-attention.md#in-flight-batching) that describes the concept
18+
the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
1919
in further detail.
2020

2121
## Before Benchmarking
@@ -67,7 +67,7 @@ sudo nvidia-smi boost-slider --vboost <max_boost_slider>
6767

6868
While `trtllm-bench` should be able to run any network that TensorRT LLM supports, the following are the list
6969
that have been validated extensively and is the same listing as seen on the
70-
[Performance Overview](./perf-overview.md) page.
70+
[Performance Overview](../performance/perf-overview.md) page.
7171

7272
- [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
7373
- [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
@@ -98,8 +98,8 @@ Export your token in the `HF_TOKEN` environment variable.
9898
- `FP8`
9999
- `NVFP4`
100100

101-
For more information about quantization, refer to [](../reference/precision.md) and
102-
the [support matrix](../reference/precision.md#support-matrix) of the supported quantization methods for each network.
101+
For more information about quantization, refer to [](../features/quantization.md) and
102+
the [support matrix](../features/quantization.md#model-supported-matrix) of the supported quantization methods for each network.
103103

104104
```{tip}
105105
Although TensorRT LLM supports more quantization modes than listed above, `trtllm-bench` currently only configures for
@@ -155,11 +155,9 @@ python benchmarks/cpp/prepare_dataset.py --stdout --tokenizer meta-llama/Llama-3
155155

156156
### Running with the PyTorch Workflow
157157

158-
To benchmark the PyTorch backend (`tensorrt_llm._torch`), use the following command with [dataset](#preparing-a-dataset) generated from previous steps. The `throughput` benchmark initializes the backend by tuning against the
159-
dataset provided via `--dataset` (or the other build mode settings described [above](#other-build-modes)).
160-
Note that CUDA graph is enabled by default. You can add additional pytorch config with
161-
`--extra_llm_api_options` followed by the path to a YAML file. For more details, please refer to the
162-
help text by running the command with `--help`.
158+
To benchmark the PyTorch backend (`tensorrt_llm._torch`), use the following command with [dataset](#preparing-a-dataset) generated from previous steps. The `throughput` benchmark initializes the backend by tuning against the dataset provided via `--dataset` (or the other build mode settings described above).
159+
160+
Note that CUDA graph is enabled by default. You can add additional pytorch config with `--extra_llm_api_options` followed by the path to a YAML file. For more details, please refer to the help text by running the command with `--help`.
163161

164162
```{tip}
165163
The command below specifies the `--model_path` option. The model path is optional and used only when you want to run a locally
@@ -310,7 +308,7 @@ Each subdirectory should contain the LoRA adapter files for that specific task.
310308
To benchmark multi-modal models with PyTorch workflow, you can follow the similar approach as above.
311309

312310
First, prepare the dataset:
313-
```
311+
```python
314312
python ./benchmarks/cpp/prepare_dataset.py \
315313
--tokenizer Qwen/Qwen2-VL-2B-Instruct \
316314
--stdout \
@@ -334,7 +332,7 @@ Sample dataset for multimodal:
334332
```
335333

336334
Run the benchmark:
337-
```
335+
```python
338336
trtllm-bench --model Qwen/Qwen2-VL-2B-Instruct \
339337
throughput \
340338
--dataset mm_data.jsonl \

0 commit comments

Comments
 (0)