Skip to content

Conversation

davidkoski
Copy link
Collaborator

  • support per layer quant config

@davidkoski davidkoski requested a review from awni April 29, 2025 21:32
///
/// This is used by ``ModelFactory/load(hub:configuration:progressHandler:)``
/// to determine the type of model to load.
public struct BaseConfiguration: Codable, Sendable {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The embedders code has the same quant config. Described below.

I wonder if we need an MLXCommon (above MLXLMCommon)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. Though I don't know of any packages which produces mixed quant embedding models. So it's also fine to leave this out. Or we can support it.. since you have it already in case it does come up.

///
/// This is used by ``ModelFactory/load(hub:configuration:progressHandler:)``
/// to determine the type of model to load.
public struct BaseConfiguration: Codable, Sendable {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted this from LanguageModel.swift as it is big enough to be its own thing.

/// "model.layers.0.self_attn.q_norm": false,
/// ```
///
/// This mixed type structure requires manual decoding.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a bit tricky. We have:

  • quantization ints
  • dictionary keys -> quantization structs
  • dictionary keys -> bools

all mixed in the one dictionary. We could treat it as an unstructured bag (like Python) but that isn't how we usually play it in swift. We can handle it with a custom Codable implementation

}
}

var quanitzationContainer: QuantizationContainer?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal bits -- this is the combined dictionary

var quanitzationContainer: QuantizationContainer?

@available(*, deprecated, message: "Please use perLayerQuantization instead")
public var quantization: Quantization? {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the single quantization value -- shouldn't use this

quantize(model: model) { path, module in
if weights["\(path).scales"] != nil {
if let perLayerQuantization {
return perLayerQuantization.quantization(layer: path)?.asTuple
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the per layer values (this allows false to mean no quant and falls back to the default if an unknown key)

@@ -91,3 +100,26 @@ public func loadWeights(

eval(model)
}

// TODO remove once mlx-swift update is adopted
func quantize(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -40,7 +40,8 @@ struct ContentView: View {
.frame(maxWidth: 350, alignment: .leading)
Toggle(isOn: $llm.enableThinking) {
Text("Thinking")
.help("Switches between thinking and non-thinking modes. Support: Qwen3")
.help(
"Switches between thinking and non-thinking modes. Support: Qwen3")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused here -- #290 fixed this (though it seems not to have run CI either). swift-format complained about this code and I applied it again, but main actually has this code:

https://github.com/ml-explore/mlx-swift-examples/blob/main/Applications/LLMEval/ContentView.swift#L43

The - line doesn't match main.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I should have realized. Branched off the wrong point in main.

- support per layer quant config
Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! Thanks for adding that!

@davidkoski davidkoski merged commit 1db9d3a into main May 12, 2025
1 check passed
@davidkoski davidkoski deleted the quant-config branch May 12, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants