-
Notifications
You must be signed in to change notification settings - Fork 291
fix #291 -- support heterogenous quant config #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
davidkoski
commented
Apr 29, 2025
- support per layer quant config
/// | ||
/// This is used by ``ModelFactory/load(hub:configuration:progressHandler:)`` | ||
/// to determine the type of model to load. | ||
public struct BaseConfiguration: Codable, Sendable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The embedders code has the same quant config. Described below.
I wonder if we need an MLXCommon (above MLXLMCommon)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. Though I don't know of any packages which produces mixed quant embedding models. So it's also fine to leave this out. Or we can support it.. since you have it already in case it does come up.
/// | ||
/// This is used by ``ModelFactory/load(hub:configuration:progressHandler:)`` | ||
/// to determine the type of model to load. | ||
public struct BaseConfiguration: Codable, Sendable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted this from LanguageModel.swift as it is big enough to be its own thing.
/// "model.layers.0.self_attn.q_norm": false, | ||
/// ``` | ||
/// | ||
/// This mixed type structure requires manual decoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned out to be a bit tricky. We have:
- quantization ints
- dictionary keys -> quantization structs
- dictionary keys -> bools
all mixed in the one dictionary. We could treat it as an unstructured bag (like Python) but that isn't how we usually play it in swift. We can handle it with a custom Codable implementation
} | ||
} | ||
|
||
var quanitzationContainer: QuantizationContainer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal bits -- this is the combined dictionary
var quanitzationContainer: QuantizationContainer? | ||
|
||
@available(*, deprecated, message: "Please use perLayerQuantization instead") | ||
public var quantization: Quantization? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the single quantization value -- shouldn't use this
quantize(model: model) { path, module in | ||
if weights["\(path).scales"] != nil { | ||
if let perLayerQuantization { | ||
return perLayerQuantization.quantization(layer: path)?.asTuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the per layer values (this allows false
to mean no quant and falls back to the default if an unknown key)
@@ -91,3 +100,26 @@ public func loadWeights( | |||
|
|||
eval(model) | |||
} | |||
|
|||
// TODO remove once mlx-swift update is adopted | |||
func quantize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -40,7 +40,8 @@ struct ContentView: View { | |||
.frame(maxWidth: 350, alignment: .leading) | |||
Toggle(isOn: $llm.enableThinking) { | |||
Text("Thinking") | |||
.help("Switches between thinking and non-thinking modes. Support: Qwen3") | |||
.help( | |||
"Switches between thinking and non-thinking modes. Support: Qwen3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused here -- #290 fixed this (though it seems not to have run CI either). swift-format complained about this code and I applied it again, but main
actually has this code:
The -
line doesn't match main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I should have realized. Branched off the wrong point in main.
0322839
to
f8850ae
Compare
- support per layer quant config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! Thanks for adding that!