Skip to content

Commit d99c3a4

Browse files
[Doc]: fix typos in .md files (including those of #23751) (#23825)
Signed-off-by: Didier Durand <[email protected]>
1 parent 3462c1c commit d99c3a4

File tree

16 files changed

+19
-19
lines changed

16 files changed

+19
-19
lines changed

docs/contributing/ci/update_pytorch_version.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ address the long build time at its source, the current workaround is to set `VLL
9090
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
9191
when manually triggering a build on Buildkite. This branch accomplishes two things:
9292
93-
1. Increase the timeout limit to 10 hours so that the build doesn't timeout.
93+
1. Increase the timeout limit to 10 hours so that the build doesn't time out.
9494
2. Allow the compiled artifacts to be written to the vLLM sccache S3 bucket
9595
to warm it up so that future builds are faster.
9696

docs/contributing/model/multimodal.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -855,7 +855,7 @@ Examples:
855855

856856
### Custom HF processor
857857

858-
Some models don't define a HF processor class on HF Hub. In that case, you can define a custom HF processor that has the same call signature as HF processors and pass it to [_call_hf_processor][vllm.multimodal.processing.BaseMultiModalProcessor._call_hf_processor].
858+
Some models don't define an HF processor class on HF Hub. In that case, you can define a custom HF processor that has the same call signature as HF processors and pass it to [_call_hf_processor][vllm.multimodal.processing.BaseMultiModalProcessor._call_hf_processor].
859859

860860
Examples:
861861

docs/deployment/frameworks/lobe-chat.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ Supports speech-synthesis, multi-modal, and extensible (function call) plugin sy
66

77
One-click FREE deployment of your private OpenAI ChatGPT/Claude/Gemini/Groq/Ollama chat application.
88

9-
It supports vLLM as a AI model provider to efficiently serve large language models.
9+
It supports vLLM as an AI model provider to efficiently serve large language models.
1010

1111
For details, see the tutorial [Using vLLM in LobeChat](https://lobehub.com/docs/usage/providers/vllm).

docs/deployment/k8s.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,7 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
380380

381381
### Startup Probe or Readiness Probe Failure, container log contains "KeyboardInterrupt: terminated"
382382

383-
If the startup or readiness probe failureThreshold is too low for the time needed to startup the server, Kubernetes scheduler will kill the container. A couple of indications that this has happened:
383+
If the startup or readiness probe failureThreshold is too low for the time needed to start up the server, Kubernetes scheduler will kill the container. A couple of indications that this has happened:
384384

385385
1. container log contains "KeyboardInterrupt: terminated"
386386
2. `kubectl get events` shows message `Container $NAME failed startup probe, will be restarted`

docs/design/fused_moe_modular_kernel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ Typically a FusedMoEPrepareAndFinalize type is backed by an All2All Dispatch & C
138138

139139
#### Step 1: Add an All2All manager
140140

141-
The purpose of the All2All Manager is to setup the All2All kernel implementations. The `FusedMoEPrepareAndFinalize` implementations typically fetch a kernel-implementation "handle" from the All2All Manager to invoke the Dispatch and Combine functions. Please look at the All2All Manager implementations [here](gh-file:vllm/distributed/device_communicators/all2all.py).
141+
The purpose of the All2All Manager is to set up the All2All kernel implementations. The `FusedMoEPrepareAndFinalize` implementations typically fetch a kernel-implementation "handle" from the All2All Manager to invoke the Dispatch and Combine functions. Please look at the All2All Manager implementations [here](gh-file:vllm/distributed/device_communicators/all2all.py).
142142

143143
#### Step 2: Add a FusedMoEPrepareAndFinalize Type
144144

docs/design/metrics.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,11 @@ http_request_duration_seconds_count{handler="/v1/completions",method="POST"} 201
9999

100100
### Multi-process Mode
101101

102-
In v0, metrics are collected in the engine core process and we use multi-process mode to make them available in the API server process. See <gh-pr:7279>.
102+
In v0, metrics are collected in the engine core process and we use multiprocess mode to make them available in the API server process. See <gh-pr:7279>.
103103

104104
### Built in Python/Process Metrics
105105

106-
The following metrics are supported by default by `prometheus_client`, but they are not exposed when multi-process mode is used:
106+
The following metrics are supported by default by `prometheus_client`, but they are not exposed when multiprocess mode is used:
107107

108108
- `python_gc_objects_collected_total`
109109
- `python_gc_objects_uncollectable_total`

docs/features/lora.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Check out <gh-file:examples/offline_inference/multilora_inference.py> for an exa
5252
## Serving LoRA Adapters
5353

5454
LoRA adapted models can also be served with the Open-AI compatible vLLM server. To do so, we use
55-
`--lora-modules {name}={path} {name}={path}` to specify each LoRA module when we kickoff the server:
55+
`--lora-modules {name}={path} {name}={path}` to specify each LoRA module when we kick off the server:
5656

5757
```bash
5858
vllm serve meta-llama/Llama-2-7b-hf \

docs/features/reasoning_outputs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ OpenAI Python client library does not officially support `reasoning_content` att
143143
print(content, end="", flush=True)
144144
```
145145

146-
Remember to check whether the `reasoning_content` exists in the response before accessing it. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
146+
Remember to check whether the `reasoning_content` exists in the response before accessing it. You could check out the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
147147

148148
## Tool Calling
149149

docs/features/structured_outputs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ This section covers the OpenAI beta wrapper over the `client.chat.completions.cr
205205

206206
At the time of writing (`openai==1.54.4`), this is a "beta" feature in the OpenAI client library. Code reference can be found [here](https://github.com/openai/openai-python/blob/52357cff50bee57ef442e94d78a0de38b4173fc2/src/openai/resources/beta/chat/completions.py#L100-L104).
207207

208-
For the following examples, vLLM was setup using `vllm serve meta-llama/Llama-3.1-8B-Instruct`
208+
For the following examples, vLLM was set up using `vllm serve meta-llama/Llama-3.1-8B-Instruct`
209209

210210
Here is a simple example demonstrating how to get structured output using Pydantic models:
211211

docs/getting_started/installation/aws_neuron.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,8 +140,8 @@ Alternatively, users can directly call the NxDI library to trace and compile you
140140

141141
- `NEURON_COMPILED_ARTIFACTS`: set this environment variable to point to your pre-compiled model artifacts directory to avoid
142142
compilation time upon server initialization. If this variable is not set, the Neuron module will perform compilation and save the
143-
artifacts under `neuron-compiled-artifacts/{unique_hash}/` sub-directory in the model path. If this environment variable is set,
144-
but the directory does not exist, or the contents are invalid, Neuron will also fallback to a new compilation and store the artifacts
143+
artifacts under `neuron-compiled-artifacts/{unique_hash}/` subdirectory in the model path. If this environment variable is set,
144+
but the directory does not exist, or the contents are invalid, Neuron will also fall back to a new compilation and store the artifacts
145145
under this specified path.
146146
- `NEURON_CONTEXT_LENGTH_BUCKETS`: Bucket sizes for context encoding. (Only applicable to `transformers-neuronx` backend).
147147
- `NEURON_TOKEN_GEN_BUCKETS`: Bucket sizes for token generation. (Only applicable to `transformers-neuronx` backend).

0 commit comments

Comments
 (0)