Skip to content

KeyError: 6 when getting nvlink_bandwidth #1467

@choyuansu

Description

@choyuansu

System Info

GPU: NVIDIA RTX A6000

Who can help?

@Tracin

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Run git clone https://github.com/NVIDIA/TensorRT-LLM.git

  2. Create Dockerfile and docker-compose.yaml in TensorRT-LLM/

    Dockerfile
    # Obtain and start the basic docker image environment.
    FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
    
    # Install dependencies, TensorRT-LLM requires Python 3.10
    RUN apt-get update && apt-get -y install \
        python3.10 \
        python3-pip \
        openmpi-bin \
        libopenmpi-dev
    
    # Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.
    # If you want to install the stable version (corresponding to the release branch), please
    # remove the `--pre` option.
    RUN --mount=type=cache,target=/root/.cache/pip pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
    
    COPY ./examples/qwen/requirements.txt .
    RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r requirements.txt
    
    WORKDIR /workdir
    
    docker-compose.yaml
    services:
      tensorrt:
        image: tensorrt-llm
        volumes:
          - .:/workdir
          - /mnt/models:/mnt/models
        command:
        - bash
        - -ec
        - |
          cd examples/qwen
          pip install -r requirements.txt
          python3 convert_checkpoint.py --model_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/ \
                    --dtype float32 \
                    --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/
          trtllm-build --checkpoint_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/ \
                    --gemm_plugin float32 \
                    --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_engines/fp32/1-gpu/
        deploy:
            resources:
              reservations:
                devices:
                  - driver: nvidia
                    count: 1
                    capabilities: [gpu]
    
  3. Run git clone https://huggingface.co/Qwen/Qwen-7B-Chat in /mnt/models/Large_Language_Model

  4. Run docker compose up

Expected behavior

No error

actual behavior

[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink is active: True
[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink version: 6
Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 411, in main
    cluster_config = infer_cluster_config()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 523, in infer_cluster_config
    cluster_info=infer_cluster_info(),
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 487, in infer_cluster_info
    nvl_bw = nvlink_bandwidth(nvl_version)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 433, in nvlink_bandwidth
    return nvl_bw_table[nvlink_version]
KeyError: 6

additional notes

Relevant code: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/auto_parallel/cluster_info.py#L427-L433

Can't seem to find info about NVLink version 6's bandwidth online.

Metadata

Metadata

Labels

Inference runtime<NV>General operational aspects of TRTLLM execution not in other categories.InvestigatingbugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions