-
Notifications
You must be signed in to change notification settings - Fork 104
Configuration
LSP-AI is configured by passing initializationOptions
to the language server at startup. There are three configurable keys:
- memory
- models
- completion
LSP-AI keeps track of the text in all opened files during the editor session and builds prompts using this text. The memory
key configures the method LSP-AI uses to track text and build prompts. Currently an empty file_store
object is only one valid option for memory
:
{
"memory": {
"file_store": {}
}
}
There will soon be more options that allow for the usage of vector storage backends such as PostgresML. This will enable more powerful context building for prompts, and even future features like semantic search over the codebase.
At server initialization, LSP-AI configures models per the models
key specification. These models are then used during textDocument/completion
and textDocument/generation
requests.
There are currently four different types of configurable models:
- llama.cpp models
- Ollama models
- OpenAI API compatible models
- Anthropic API compatible models
- Mistral AI API compatible models
The type of model is specified by setting the type
parameter.
LSP-AI binds directly to the llama.cpp library and runs LLMs locally.
{
"models": {
"model1": {
"type": "llama_cpp",
"repository": "stabilityai/stable-code-3b",
"name": "stable-code-3b-Q5_K_M.gguf",
"n_ctx": 2048,
"n_gpu_layers": 1000
}
}
}
Parameters:
- repository the HuggingFace repository the model is located in
- name the name of the model file
-
file_path the path to a gguf file to use (either provide
file_path
orrepository
andname
) - n_ctx the maximum number of tokens the model can process at once
- n_gpu_layers the number of layers to offload onto the GPU
- max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)
LSP-AI uses the Ollama API over localhost.
{
"models": {
"model1": {
"type": "ollama",
"model": "deepseek-coder"
}
}
}
Parameters:
- model the model to use
- max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)
LSP-AI works with any OpenAI compatible API. This means LSP-AI will work with OpenAI and any model hosted behind a compatible API. We recommend considering Groq, OpenRouter, or Fireworks AI for hosted model inference though we are sure there are other good providers out there.
Using an API provider means parts of your code may be sent to the provider in the form of a prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend
{
"models": {
"model1": {
"type": "open_ai",
"chat_endpoint": "https://api.groq.com/openai/v1/chat/completions",
"model": "llama3-70b-8192",
"auth_token_env_var_name": "GROQ_API_KEY"
}
}
}
Parameters:
- completions_endpoint is the endpoint for text completion
- chat_endpoint is the endpoint for chat completion
- model specifies which model to use
-
auth_token_env_var_name is the environment variable name to get the authentication token from. See
auth_token
for more authentication options -
auth_token is the authentication token to use. This can be used in place of
auth_token_env_var_name
- max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)
LSP-AI works with any Anthropic compatible API. This means LSP-AI will work with Anthropic and any model hosted behind a compatible API.
Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.
{
"models": {
"model1": {
"type": "anthropic",
"chat_endpoint": "https://api.anthropic.com/v1/messages",
"model": "claude-3-haiku-20240307",
"auth_token_env_var_name": "ANTHROPIC_API_KEY"
}
}
}
Parameters:
- chat_endpoint is the endpoint for chat completion
- model specifies which model to use
-
auth_token_env_var_name is the environment variable name to get the authentication token from. See
auth_token
for more authentication options -
auth_token is the authentication token to use. This can be used in place of
auth_token_env_var_name
- max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)
LSP-AI works with any Mistral AI FIM compatible API. This means LSP-AI will work with Mistral FIM API models and any other models that use the same FIM API.
Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.
{
"models": {
"model1": {
"type": "mistral_fim",
"fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
"model": "codestral-latest",
"auth_token_env_var_name": "MISTRAL_API_KEY"
}
}
}
Parameters:
- fim_endpoint is the endpoint for FIM
- model specifies which model to use
-
auth_token_env_var_name is the environment variable name to get the authentication token from. See
auth_token
for more authentication options -
auth_token is the authentication token to use. This can be used in place of
auth_token_env_var_name
- max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)
LSP-AI is a language server that provides support for completions. To use this feature, provide a completion
key in the initializationOptions
. You can disable completions by leaving out the completion
key.
{
"completion": {
"model": "model1",
"parameters": {}
}
}
The model
key specifies which model to use during a completion request. The value of model
must be a key specified in the models
key. Notice we specify the model
as model1
which is the key we used in all examples above. The choice of model1
was arbitrary and can be any valid string.
The available keys in the parameters
object depend on the type of model specified and which features you want enabled for the model.
Chat is enabled by supplying the messages
key in the parameters
object.
{
"completion": {
"model": "model1",
"parameters": {
"messages": [
{
"role": "system",
"content": "Test"
},
{
"role": "user",
"content": "Test {CONTEXT} - {CODE}"
}
],
"max_context_size": 1024
}
}
}
Note that {CONTEXT}
and {CODE}
are replaced by LSP-AI with the context and code. These values are supplied by the memory backend. The file_store
backend leaves the {CONTEXT}
blank, and provides {CODE}
around the cursor controlled by the max_context_size
. To see the prompts being generated by LSP-AI enable debugging.
If the messages
key is provided and you are using an OpenAI compatible API, be sure to provide the chat_endpoint
and that the model is instruction tuned.
FIM is enabled by supplying the fim
key in the parameters
object.
{
"completion": {
"model": "model1",
"parameters": {
"fim": {
"start": "<fim_prefix>",
"middle": "<fim_suffix>",
"end": "<fim_middle>"
},
"max_context_size": 1024
}
}
}
With the file_store
backend, this will prepend start
, insert middle
at the cursor, and append end
to the code around the cursor.
Note that Mistral AI FIM API compatible models don't require the fim
parameters. Instead, when using mistral_fim
models, FIM is enabled automatically and we do not do augment the code with special FIM tokens as the API itself presumably handles this.
If both messages
and fim
are omitted, the model performs text completion. Be sure to provide the completions_endpoint
if using an OpenAI compatible API. In this case, the file_store
backend will take max_context_size
tokens before the cursor as the prompt.
The other parameters available are dependent on the backend being used.
llama.cpp:
- max_tokens restricts the number of tokens the model generates
- chat_template the jinja template to use. Currently we use MiniJinja as the backend.
- chat_format the chat template format to use. This is directly forwarded to llama.cpp's apply_chat_template function.
Ollama:
- options passes additional options to the model. See Ollama docs for more info.
- template - see Ollama docs
- system - see Ollama docs
- keep_alive - see Ollama docs
OpenAI:
- max_tokens restricts the number of tokens to generate
- top_p - see OpenAI docs
- presence_penalty - see OpenAI docs
- frequency_penalty - see OpenAI docs
- temperature - see OpenAI docs
Anthropic:
- system - see Anthropic system prompts
- max_tokens restricts the number of tokens to generate
- top_p - see Anthropic docs
- temperature - see Anthropic docs
Mistral FIM:
- max_tokens restricts the number of tokens to generate
- min_tokens - the minimum number of tokens to generate
- temperature - see Mistral AI docs
- top_p - see Mistral AI docs
- stop - see Mistral AI docs
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "llama_cpp",
"repository": "stabilityai/stable-code-3b",
"name": "stable-code-3b-Q5_K_M.gguf",
"n_ctx": 2048
}
},
"completion": {
"model": "model1",
"parameters": {
"fim": {
"start": "<fim_prefix>",
"middle": "<fim_suffix>",
"end": "<fim_middle>"
},
"max_context": 2000,
"max_new_tokens": 32
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "llama_cpp",
"repository": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF-v2",
"name": "Meta-Llama-3-70B-Instruct-v2.Q5_K_M.gguf",
"n_ctx": 2048
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 1800,
"max_tokens": 32,
"messages": [
{
"role": "system",
"content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
},
{
"role": "user",
"content": "def greet(name):\n print(f\"Hello, {<CURSOR>}\")"
},
{
"role": "assistant",
"content": "name"
},
{
"role": "user",
"content": "function sum(a, b) {\n return a + <CURSOR>;\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "fn multiply(a: i32, b: i32) -> i32 {\n a * <CURSOR>\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "# <CURSOR>\ndef add(a, b):\n return a + b"
},
{
"role": "assistant",
"content": "Adds two numbers"
},
{
"role": "user",
"content": "# This function checks if a number is even\n<CURSOR>"
},
{
"role": "assistant",
"content": "def is_even(n):\n return n % 2 == 0"
},
{
"role": "user",
"content": "{CODE}"
}
]
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "llama_cpp",
"repository": "TheBloke/deepseek-coder-6.7B-instruct-GGUF",
"name": "deepseek-coder-6.7b-instruct.Q5_K_S.gguf",
"n_ctx": 2048
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 2000,
"max_new_tokens": 32
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "ollama",
"model": "deepseek-coder"
}
},
"completion": {
"model": "model1",
"parameters": {
"fim": {
"start": "<|fim▁begin|>",
"middle": "<|fim▁hole|>",
"end": "<|fim▁end|>"
},
"max_context": 2000,
"options": {
"num_predict": 32
}
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "ollama",
"model": "llama3"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 1800,
"options": {
"num_predict": 32
},
"system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
"messages": [
{
"role": "user",
"content": "def greet(name):\n print(f\"Hello, {<CURSOR>}\")"
},
{
"role": "assistant",
"content": "name"
},
{
"role": "user",
"content": "function sum(a, b) {\n return a + <CURSOR>;\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "fn multiply(a: i32, b: i32) -> i32 {\n a * <CURSOR>\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "# <CURSOR>\ndef add(a, b):\n return a + b"
},
{
"role": "assistant",
"content": "Adds two numbers"
},
{
"role": "user",
"content": "# This function checks if a number is even\n<CURSOR>"
},
{
"role": "assistant",
"content": "def is_even(n):\n return n % 2 == 0"
},
{
"role": "user",
"content": "{CODE}"
}
]
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "ollama",
"model": "deepseek-coder"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 2000,
"options": {
"num_predict": 32
}
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "open_ai",
"chat_endpoint": "https://api.openai.com/v1/chat/completions",
"model": "gpt-4o",
"auth_token_env_var_name": "OPENAI_API_KEY"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 2048,
"max_new_tokens": 128,
"messages": [
{
"role": "system",
"content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
},
{
"role": "user",
"content": "def greet(name):\n print(f\"Hello, {<CURSOR>}\")"
},
{
"role": "assistant",
"content": "name"
},
{
"role": "user",
"content": "function sum(a, b) {\n return a + <CURSOR>;\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "fn multiply(a: i32, b: i32) -> i32 {\n a * <CURSOR>\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "# <CURSOR>\ndef add(a, b):\n return a + b"
},
{
"role": "assistant",
"content": "Adds two numbers"
},
{
"role": "user",
"content": "# This function checks if a number is even\n<CURSOR>"
},
{
"role": "assistant",
"content": "def is_even(n):\n return n % 2 == 0"
},
{
"role": "user",
"content": "{CODE}"
}
]
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "open_ai",
"completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
"model": "accounts/fireworks/models/starcoder-16b",
"auth_token_env_var_name": "FIREWORKS_API_KEY"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 2048,
"max_new_tokens": 128,
"fim": {
"start": "<fim_prefix>",
"middle": "<fim_middle>",
"end": "<fim_suffix>"
}
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "open_ai",
"completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
"model": "accounts/fireworks/models/starcoder-16b",
"auth_token_env_var_name": "FIREWORKS_API_KEY"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 2048,
"max_new_tokens": 128
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "anthropic",
"chat_endpoint": "https://api.anthropic.com/v1/messages",
"model": "claude-3-haiku-20240307",
"auth_token_env_var_name": "ANTHROPIC_API_KEY"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_context": 2048,
"max_new_tokens": 128,
"system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
"messages": [
{
"role": "user",
"content": "def greet(name):\n print(f\"Hello, {<CURSOR>}\")"
},
{
"role": "assistant",
"content": "name"
},
{
"role": "user",
"content": "function sum(a, b) {\n return a + <CURSOR>;\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "fn multiply(a: i32, b: i32) -> i32 {\n a * <CURSOR>\n}"
},
{
"role": "assistant",
"content": "b"
},
{
"role": "user",
"content": "# <CURSOR>\ndef add(a, b):\n return a + b"
},
{
"role": "assistant",
"content": "Adds two numbers"
},
{
"role": "user",
"content": "# This function checks if a number is even\n<CURSOR>"
},
{
"role": "assistant",
"content": "def is_even(n):\n return n % 2 == 0"
},
{
"role": "user",
"content": "{CODE}"
}
]
}
}
}
{
"memory": {
"file_store": {}
},
"models": {
"model1": {
"type": "mistral_fim",
"fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
"model": "codestral-latest",
"auth_token_env_var_name": "MISTRAL_API_KEY"
}
},
"completion": {
"model": "model1",
"parameters": {
"max_tokens": 64
}
}
}