Configuration

Overview

LSP-AI is configured by passing initializationOptions to the language server at startup. There are three configurable keys:

memory
models
completion

Memory

LSP-AI keeps track of the text in all opened files during the editor session and builds prompts using this text. The memory key configures the method LSP-AI uses to track text and build prompts. Currently an empty file_store object is only one valid option for memory:

{
  "memory": {
    "file_store": {}
  }
}

There will soon be more options that allow for the usage of vector storage backends such as PostgresML. This will enable more powerful context building for prompts, and even future features like semantic search over the codebase.

Models

At server initialization, LSP-AI configures models per the models key specification. These models are then used during textDocument/completion and textDocument/generation requests.

There are currently four different types of configurable models:

llama.cpp models
Ollama models
OpenAI API compatible models
Anthropic API compatible models
Mistral AI API compatible models

The type of model is specified by setting the type parameter.

llama.cpp

LSP-AI binds directly to the llama.cpp library and runs LLMs locally.

{
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "stabilityai/stable-code-3b",
      "name": "stable-code-3b-Q5_K_M.gguf",
      "n_ctx": 2048,
      "n_gpu_layers": 1000
    }
  }
}

Parameters:

repository the HuggingFace repository the model is located in
name the name of the model file
file_path the path to a gguf file to use (either provide file_path or repository and name)
n_ctx the maximum number of tokens the model can process at once
n_gpu_layers the number of layers to offload onto the GPU
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Ollama Models

LSP-AI uses the Ollama API over localhost.

{
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  }
}

Parameters:

model the model to use
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

OpenAI Compatible APIs

LSP-AI works with any OpenAI compatible API. This means LSP-AI will work with OpenAI and any model hosted behind a compatible API. We recommend considering Groq, OpenRouter, or Fireworks AI for hosted model inference though we are sure there are other good providers out there.

Using an API provider means parts of your code may be sent to the provider in the form of a prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend

{
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.groq.com/openai/v1/chat/completions",
      "model": "llama3-70b-8192",
      "auth_token_env_var_name": "GROQ_API_KEY"
    }
  }
}

Parameters:

completions_endpoint is the endpoint for text completion
chat_endpoint is the endpoint for chat completion
model specifies which model to use
auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Anthropic Compatible APIs

LSP-AI works with any Anthropic compatible API. This means LSP-AI will work with Anthropic and any model hosted behind a compatible API.

Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.

{
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  }
}

Parameters:

chat_endpoint is the endpoint for chat completion
model specifies which model to use
auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Mistral FIM Compatible APIs

LSP-AI works with any Mistral AI FIM compatible API. This means LSP-AI will work with Mistral FIM API models and any other models that use the same FIM API.

Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.

{
  "models": {
    "model1": {
      "type": "mistral_fim",
      "fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
      "model": "codestral-latest",
      "auth_token_env_var_name": "MISTRAL_API_KEY"
    }
  }
}

Parameters:

fim_endpoint is the endpoint for FIM
model specifies which model to use
auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Completion

LSP-AI is a language server that provides support for completions. To use this feature, provide a completion key in the initializationOptions. You can disable completions by leaving out the completion key.

{
  "completion": {
    "model": "model1",
    "parameters": {}
  }
}

The model key specifies which model to use during a completion request. The value of model must be a key specified in the models key. Notice we specify the model as model1 which is the key we used in all examples above. The choice of model1 was arbitrary and can be any valid string.

The available keys in the parameters object depend on the type of model specified and which features you want enabled for the model.

Chat

Chat is enabled by supplying the messages key in the parameters object.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "messages": [
        {
          "role": "system",
          "content": "Test"
        },
        {
          "role": "user",
          "content": "Test {CONTEXT} - {CODE}"
        }
      ],
      "max_context_size": 1024
    }
  }
}

Note that {CONTEXT} and {CODE} are replaced by LSP-AI with the context and code. These values are supplied by the memory backend. The file_store backend leaves the {CONTEXT} blank, and provides {CODE} around the cursor controlled by the max_context_size. To see the prompts being generated by LSP-AI enable debugging.

If the messages key is provided and you are using an OpenAI compatible API, be sure to provide the chat_endpoint and that the model is instruction tuned.

FIM

FIM is enabled by supplying the fim key in the parameters object.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_suffix>",
        "end": "<fim_middle>"
      },
      "max_context_size": 1024
    }
  }
}

With the file_store backend, this will prepend start, insert middle at the cursor, and append end to the code around the cursor.

Note that Mistral AI FIM API compatible models don't require the fim parameters. Instead, when using mistral_fim models, FIM is enabled automatically and we do not do augment the code with special FIM tokens as the API itself presumably handles this.

Text Completion

If both messages and fim are omitted, the model performs text completion. Be sure to provide the completions_endpoint if using an OpenAI compatible API. In this case, the file_store backend will take max_context_size tokens before the cursor as the prompt.

Other Parameters

The other parameters available are dependent on the backend being used.

llama.cpp:

max_tokens restricts the number of tokens the model generates
chat_template the jinja template to use. Currently we use MiniJinja as the backend.
chat_format the chat template format to use. This is directly forwarded to llama.cpp's apply_chat_template function.

Ollama:

options passes additional options to the model. See Ollama docs for more info.
template - see Ollama docs
system - see Ollama docs
keep_alive - see Ollama docs

OpenAI:

max_tokens restricts the number of tokens to generate
top_p - see OpenAI docs
presence_penalty - see OpenAI docs
frequency_penalty - see OpenAI docs
temperature - see OpenAI docs

Anthropic:

system - see Anthropic system prompts
max_tokens restricts the number of tokens to generate
top_p - see Anthropic docs
temperature - see Anthropic docs

Mistral FIM:

max_tokens restricts the number of tokens to generate
min_tokens - the minimum number of tokens to generate
temperature - see Mistral AI docs
top_p - see Mistral AI docs
stop - see Mistral AI docs

Example Configurations

llama.cpp

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "stabilityai/stable-code-3b",
      "name": "stable-code-3b-Q5_K_M.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_suffix>",
        "end": "<fim_middle>"
      },
      "max_context": 2000,
      "max_new_tokens": 32
    }
  }
}

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF-v2",
      "name": "Meta-Llama-3-70B-Instruct-v2.Q5_K_M.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1800,
      "max_tokens": 32,
      "messages": [
        {
          "role": "system",
          "content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
        },
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "TheBloke/deepseek-coder-6.7B-instruct-GGUF",
      "name": "deepseek-coder-6.7b-instruct.Q5_K_S.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2000,
      "max_new_tokens": 32
    }
  }
}

Ollama

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<｜fim▁begin｜>",
        "middle": "<｜fim▁hole｜>",
        "end": "<｜fim▁end｜>"
      },
      "max_context": 2000,
      "options": {
        "num_predict": 32
      }
    }
  }
}

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "llama3"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1800,
      "options": {
        "num_predict": 32
      },
      "system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
      "messages": [
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2000,
      "options": {
        "num_predict": 32
      }
    }
  }
}

OpenAI Compatible APIs

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.openai.com/v1/chat/completions",
      "model": "gpt-4o",
      "auth_token_env_var_name": "OPENAI_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128,
      "messages": [
        {
          "role": "system",
          "content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
        },
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
      "model": "accounts/fireworks/models/starcoder-16b",
      "auth_token_env_var_name": "FIREWORKS_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128,
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_middle>",
        "end": "<fim_suffix>"
      }
    }
  }
}

Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
      "model": "accounts/fireworks/models/starcoder-16b",
      "auth_token_env_var_name": "FIREWORKS_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128
    }
  }
}

Anthropic Compatible APIs

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128,
      "system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
      "messages": [
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Mistral FIM Compatible APIs

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "mistral_fim",
      "fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
      "model": "codestral-latest",
      "auth_token_env_var_name": "MISTRAL_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_tokens": 64
    }
  }
}

Configuration

Overview

Memory

Models

llama.cpp

Ollama Models

OpenAI Compatible APIs

Anthropic Compatible APIs

Mistral FIM Compatible APIs

Completion

Chat

FIM

Text Completion

Other Parameters

Example Configurations

llama.cpp

FIM

Chat

Completion

Ollama

FIM

Chat

Completion

OpenAI Compatible APIs

Chat

FIM

Completion

Anthropic Compatible APIs

Chat

Mistral FIM Compatible APIs

FIM

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally