[Oversight]  -> Ideal Rope for CodeLLama 2 based models differs vastly from LLama 2.  

I did not discover this.  A user of KoboldCPP posted that auto-rope for Code Llama was incorrect.   Just in case this applies to LlamaCPP, I wanted to draw attention to the issue.  Here is a quote of their findings.


> _**Nexesenex**_
> 
> CodeLlama 2 models are loaded with an automatic rope base frequency similar to Llama 2 when the rope is not specificed in the command line launch.
> But the initial Base Rope frequency for CL2 is 1000000, not 10000.
> 
> I couldn't find nor figure out the formula to calculate a proper rope base frequency for CL2 accordingly to context length (if you have some ideads..), I'm lame in algebra, but from empirical perplexity tests, the best base rope frequency seem to revolve around 100000 if the rope scale is left at 1 up to a context of 12288.
> 
> I observed that the variance between 10000, 100000 and 1000000 is a curve with 0.2 perplexity amplitude at 512 ctx and 0.02 perplexity around 12288, with 100000 having the lowest perplexity.
> 
> I could make more tests on a 7b model with a proper command/script logging on llama.cpp the perplexities found with different rope base frequency/scale config up to 32768 or even higher, as some developpers seem to use on ggermanov reddit, but I didn't find the script (and I'm on Windows).
> 
> Once Johannes Gaessler PR about the kv cache quantized in q8_0 is accepted, we can probably test up to 100,000 ctx on 7b with a single 24GB graphic card.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Oversight] -> Ideal Rope for CodeLLama 2 based models differs vastly from LLama 2. #3090

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Oversight] -> Ideal Rope for CodeLLama 2 based models differs vastly from LLama 2. #3090

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions