-
Notifications
You must be signed in to change notification settings - Fork 558
Open
Labels
bugSomething isn't workingSomething isn't workingstat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from author
Description
In the recent commit, I have noticed an inconsistency in the configuration of the query_pre_attn_scalar
parameter between the 9B and 27B models in this repository.
Specifically:
In the 9B model, query_pre_attn_scalar
is not explicitly set and appears to use the default value derived from head_dim (256, not 224 which can be derived by # hidden_size / # attention_heads).
In the 27B model, query_pre_attn_scalar
is explicitly set to 144 (# hidden_size / # attention_heads).
Could you please provide some insight into the reasoning behind this difference? Is there a specific rationale for not setting query_pre_attn_scalar
in the 9B model while explicitly setting it in the 27B model?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from author