Skip to content

Configuration

Silas Marvin edited this page Jun 9, 2024 · 47 revisions

Overview

LSP-AI is configured by passing initializationOptions to the language server at startup. There are three configurable keys:

  • memory
  • models
  • completion

Memory

LSP-AI keeps track of the text in all opened files during the editor session and builds prompts using this text. The memory key configures the method LSP-AI uses to track text and build prompts. Currently an empty file_store object is only one valid option for memory:

{
  "memory": {
    "file_store": {}
  }
}

There will soon be more options that allow for the usage of vector storage backends such as PostgresML. This will enable more powerful context building for prompts, and even future features like semantic search over the codebase.

Models

At server initialization, LSP-AI configures models per the models key specification. These models are then used during textDocument/completion and textDocument/generation requests.

There are currently four different types of configurable models:

  • llama.cpp models
  • Ollama models
  • OpenAI API compatible models
  • Anthropic API compatible models
  • Mistral AI API compatible models

The type of model is specified by setting the type parameter.

llama.cpp

LSP-AI binds directly to the llama.cpp library and runs LLMs locally.

{
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "stabilityai/stable-code-3b",
      "name": "stable-code-3b-Q5_K_M.gguf",
      "n_ctx": 2048,
      "n_gpu_layers": 1000
    }
  }
}

Parameters:

  • repository the HuggingFace repository the model is located in
  • name the name of the model file
  • file_path the path to a gguf file to use (either provide file_path or repository and name)
  • n_ctx the maximum number of tokens the model can process at once
  • n_gpu_layers the number of layers to offload onto the GPU
  • max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Ollama Models

LSP-AI uses the Ollama API over localhost.

{
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  }
}

Parameters:

  • model the model to use
  • max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

OpenAI Compatible APIs

LSP-AI works with any OpenAI compatible API. This means LSP-AI will work with OpenAI and any model hosted behind a compatible API. We recommend considering Groq, OpenRouter, or Fireworks AI for hosted model inference though we are sure there are other good providers out there.

Using an API provider means parts of your code may be sent to the provider in the form of a prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend

{
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.groq.com/openai/v1/chat/completions",
      "model": "llama3-70b-8192",
      "auth_token_env_var_name": "GROQ_API_KEY"
    }
  }
}

Parameters:

  • completions_endpoint is the endpoint for text completion
  • chat_endpoint is the endpoint for chat completion
  • model specifies which model to use
  • auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
  • auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
  • max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Anthropic Compatible APIs

LSP-AI works with any Anthropic compatible API. This means LSP-AI will work with Anthropic and any model hosted behind a compatible API.

Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.

{
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  }
}

Parameters:

  • chat_endpoint is the endpoint for chat completion
  • model specifies which model to use
  • auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
  • auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
  • max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Mistral FIM Compatible APIs

LSP-AI works with any Mistral AI FIM compatible API. This means LSP-AI will work with Mistral FIM API models and any other models that use the same FIM API.

Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.

{
  "models": {
    "model1": {
      "type": "mistral_fim",
      "fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
      "model": "codestral-latest",
      "auth_token_env_var_name": "MISTRAL_API_KEY"
    }
  }
}

Parameters:

  • fim_endpoint is the endpoint for FIM
  • model specifies which model to use
  • auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
  • auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
  • max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Completion

LSP-AI is a language server that provides support for completions. To use this feature, provide a completion key in the initializationOptions. You can disable completions by leaving out the completion key.

{
  "completion": {
    "model": "model1",
    "parameters": {}
  }
}

The model key specifies which model to use during a completion request. The value of model must be a key specified in the models key. Notice we specify the model as model1 which is the key we used in all examples above. The choice of model1 was arbitrary and can be any valid string.

The available keys in the parameters object depend on the type of model specified and which features you want enabled for the model.

Chat

Chat is enabled by supplying the messages key in the parameters object.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "messages": [
        {
          "role": "system",
          "content": "Test"
        },
        {
          "role": "user",
          "content": "Test {CONTEXT} - {CODE}"
        }
      ],
      "max_context_size": 1024
    }
  }
}

Note that {CONTEXT} and {CODE} are replaced by LSP-AI with the context and code. These values are supplied by the memory backend. The file_store backend leaves the {CONTEXT} blank, and provides {CODE} around the cursor controlled by the max_context_size. To see the prompts being generated by LSP-AI enable debugging.

If the messages key is provided and you are using an OpenAI compatible API, be sure to provide the chat_endpoint and that the model is instruction tuned.

FIM

FIM is enabled by supplying the fim key in the parameters object.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_suffix>",
        "end": "<fim_middle>"
      },
      "max_context_size": 1024
    }
  }
}

With the file_store backend, this will prepend start, insert middle at the cursor, and append end to the code around the cursor.

Note that Mistral AI FIM API compatible models don't require the fim parameters. Instead, when using mistral_fim models, FIM is enabled automatically and we do not do augment the code with special FIM tokens as the API itself presumably handles this.

Text Completion

If both messages and fim are omitted, the model performs text completion. Be sure to provide the completions_endpoint if using an OpenAI compatible API. In this case, the file_store backend will take max_context_size tokens before the cursor as the prompt.

Other Parameters

The other parameters available are dependent on the backend being used.

llama.cpp:

  • max_tokens restricts the number of tokens the model generates
  • chat_template the jinja template to use. Currently we use MiniJinja as the backend.
  • chat_format the chat template format to use. This is directly forwarded to llama.cpp's apply_chat_template function.

Ollama:

OpenAI:

Anthropic:

Mistral FIM:

Example Configurations

llama.cpp

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "stabilityai/stable-code-3b",
      "name": "stable-code-3b-Q5_K_M.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_suffix>",
        "end": "<fim_middle>"
      },
      "max_context": 2000,
      "max_new_tokens": 32
    }
  }
}

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF-v2",
      "name": "Meta-Llama-3-70B-Instruct-v2.Q5_K_M.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1800,
      "max_tokens": 32,
      "messages": [
        {
          "role": "system",
          "content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
        },
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "TheBloke/deepseek-coder-6.7B-instruct-GGUF",
      "name": "deepseek-coder-6.7b-instruct.Q5_K_S.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2000,
      "max_new_tokens": 32
    }
  }
}

Ollama

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<|fim▁begin|>",
        "middle": "<|fim▁hole|>",
        "end": "<|fim▁end|>"
      },
      "max_context": 2000,
      "options": {
        "num_predict": 32
      }
    }
  }
}

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "llama3"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1800,
      "options": {
        "num_predict": 32
      },
      "system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
      "messages": [
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2000,
      "options": {
        "num_predict": 32
      }
    }
  }
}

OpenAI Compatible APIs

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.openai.com/v1/chat/completions",
      "model": "gpt-4o",
      "auth_token_env_var_name": "OPENAI_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128,
      "messages": [
        {
          "role": "system",
          "content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
        },
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
      "model": "accounts/fireworks/models/starcoder-16b",
      "auth_token_env_var_name": "FIREWORKS_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128,
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_middle>",
        "end": "<fim_suffix>"
      }
    }
  }
}

Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
      "model": "accounts/fireworks/models/starcoder-16b",
      "auth_token_env_var_name": "FIREWORKS_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128
    }
  }
}

Anthropic Compatible APIs

Chat

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_new_tokens": 128,
      "system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
      "messages": [
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Mistral FIM Compatible APIs

FIM

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "mistral_fim",
      "fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
      "model": "codestral-latest",
      "auth_token_env_var_name": "MISTRAL_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_tokens": 64
    }
  }
}
Clone this wiki locally