fix #291 -- support heterogenous quant config #293

davidkoski · 2025-04-29T21:32:31Z

support per layer quant config

davidkoski · 2025-04-29T21:33:53Z

Libraries/Embedders/BaseConfiguration.swift

+///
+/// This is used by ``ModelFactory/load(hub:configuration:progressHandler:)``
+/// to determine the type of model to load.
+public struct BaseConfiguration: Codable, Sendable {


The embedders code has the same quant config. Described below.

I wonder if we need an MLXCommon (above MLXLMCommon)?

Possibly. Though I don't know of any packages which produces mixed quant embedding models. So it's also fine to leave this out. Or we can support it.. since you have it already in case it does come up.

davidkoski · 2025-04-29T21:34:33Z

Libraries/MLXLMCommon/BaseConfiguration.swift

+///
+/// This is used by ``ModelFactory/load(hub:configuration:progressHandler:)``
+/// to determine the type of model to load.
+public struct BaseConfiguration: Codable, Sendable {


Extracted this from LanguageModel.swift as it is big enough to be its own thing.

davidkoski · 2025-04-29T21:36:11Z

Libraries/MLXLMCommon/BaseConfiguration.swift

+    ///     "model.layers.0.self_attn.q_norm": false,
+    /// ```
+    ///
+    /// This mixed type structure requires manual decoding.


This turned out to be a bit tricky. We have:

quantization ints

dictionary keys -> quantization structs

dictionary keys -> bools

all mixed in the one dictionary. We could treat it as an unstructured bag (like Python) but that isn't how we usually play it in swift. We can handle it with a custom Codable implementation

davidkoski · 2025-04-29T21:39:08Z

Libraries/MLXLMCommon/BaseConfiguration.swift

+        }
+    }
+
+    var quanitzationContainer: QuantizationContainer?


internal bits -- this is the combined dictionary

davidkoski · 2025-04-29T21:39:23Z

Libraries/MLXLMCommon/BaseConfiguration.swift

+    var quanitzationContainer: QuantizationContainer?
+
+    @available(*, deprecated, message: "Please use perLayerQuantization instead")
+    public var quantization: Quantization? {


the single quantization value -- shouldn't use this

davidkoski · 2025-04-29T21:40:47Z

Libraries/MLXLMCommon/Load.swift

+        quantize(model: model) { path, module in
+            if weights["\(path).scales"] != nil {
+                if let perLayerQuantization {
+                    return perLayerQuantization.quantization(layer: path)?.asTuple


Use the per layer values (this allows false to mean no quant and falls back to the default if an unknown key)

davidkoski · 2025-04-29T21:41:06Z

Libraries/MLXLMCommon/Load.swift

@@ -91,3 +100,26 @@ public func loadWeights(

    eval(model)
 }
+
+// TODO remove once mlx-swift update is adopted
+func quantize(


See ml-explore/mlx-swift#229

davidkoski · 2025-04-29T21:56:38Z

Applications/LLMEval/ContentView.swift

@@ -40,7 +40,8 @@ struct ContentView: View {
                    .frame(maxWidth: 350, alignment: .leading)
                    Toggle(isOn: $llm.enableThinking) {
                        Text("Thinking")
-                            .help("Switches between thinking and non-thinking modes. Support: Qwen3")
+                            .help(
+                                "Switches between thinking and non-thinking modes. Support: Qwen3")


I am a little confused here -- #290 fixed this (though it seems not to have run CI either). swift-format complained about this code and I applied it again, but main actually has this code:

https://github.com/ml-explore/mlx-swift-examples/blob/main/Applications/LLMEval/ContentView.swift#L43

The - line doesn't match main.

Haha, I should have realized. Branched off the wrong point in main.

- support per layer quant config

awni

Looks great!! Thanks for adding that!

davidkoski requested a review from awni April 29, 2025 21:32

davidkoski commented Apr 29, 2025

View reviewed changes

davidkoski force-pushed the quant-config branch from 0322839 to f8850ae Compare April 29, 2025 21:58

This was referenced Apr 30, 2025

Add quantization support to Qwen2 Model #280

Closed

Falled to load Qwen3 model locally #305

Open

davidkoski force-pushed the quant-config branch from f8850ae to b3fc69b Compare May 2, 2025 17:57

fix #291 -- support heterogenous quant config

3d961b8

- support per layer quant config

davidkoski force-pushed the quant-config branch from b3fc69b to 3d961b8 Compare May 2, 2025 17:57

awni approved these changes May 9, 2025

View reviewed changes

davidkoski merged commit 1db9d3a into main May 12, 2025
1 check passed

davidkoski deleted the quant-config branch May 12, 2025 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix #291 -- support heterogenous quant config #293

fix #291 -- support heterogenous quant config #293

Uh oh!

davidkoski commented Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

awni May 9, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

davidkoski Apr 29, 2025

Uh oh!

awni left a comment

Uh oh!

Uh oh!

Uh oh!

fix #291 -- support heterogenous quant config #293

fix #291 -- support heterogenous quant config #293

Uh oh!

Conversation

davidkoski commented Apr 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!