server : accept extra_context for the infill endpoint #9874
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pass additional (extra) context to the
/infillendpoint:curl \ --silent --no-buffer --request POST \ --url http://127.0.0.1:8012/infill \ --header "Content-Type: application/json" \ --data '{"extra_context": [{"filename": "llama.h", "text": "LLAMA_API int32_t llama_n_threads(struct llama_context * ctx);\n"}], "input_suffix": "}\n", "input_prefix": "#include <cstdio>\n#include \"llama.h\"\n\nint main() {\n int n_threads = ", "prompt": ""}' | jq ... { ... "content": "llama_n_threads(nullptr);\n printf(\"Number of threads: %d\\n\", n_threads);\n return 0;\n", ... }The
"extra_context"field is an array of{"filename": string, "text": string}objects.If the model has
FIM_REPOandFIM_FILE_SEPtokens, the repo-level pattern is used:<FIM_REP>myproject <FIM_SEP>{chunk 0 filename} {chunk 0 text} <FIM_SEP>{chunk 1 filename} {chunk 1 text} ... <FIM_SEP>filename <FIM_PRE>[input_prefix]<FIM_SUF>[input_suffix]<FIM_MID>[prompt]If the tokens are missing, then the extra context is simply prefixed at the start:
In this case, the elements of the
"extra_context"array are concatenated by separating them with the string:The extra context can be used to implement a ring-buffered context for FIM completion that can be efficiently reused via #9866.