Skip to content

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

@kiddj

Description

@kiddj

In the recent commit, I have noticed an inconsistency in the configuration of the query_pre_attn_scalar parameter between the 9B and 27B models in this repository.

Specifically:

In the 9B model, query_pre_attn_scalar is not explicitly set and appears to use the default value derived from head_dim (256, not 224 which can be derived by # hidden_size / # attention_heads).
In the 27B model, query_pre_attn_scalar is explicitly set to 144 (# hidden_size / # attention_heads).

Could you please provide some insight into the reasoning behind this difference? Is there a specific rationale for not setting query_pre_attn_scalar in the 9B model while explicitly setting it in the 27B model?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstat:awaiting responseStatus - Awaiting response from author

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions