@@ -57,16 +57,16 @@ inference. Then, we recompile the network using this profile information to
5757convert the network into a quantized form, allowing for static optimization of
5858the quantized graph. We convert portions of the network into islands of integer
5959computation and aim to generate outputs in the range that the original
60- floating-point network produces. During the conversion, for the following types
61- of quantized nodes, we ignore the output's quantization params (if they are
62- provided) and force the output have the same quantization params as the input
60+ floating-point network produces. During the conversion, for the following types
61+ of quantized nodes, we ignore the output's quantization params (if they are
62+ provided) and force the output have the same quantization params as the input
6363for performance purpose:
6464```
65- LocalResponseNormalizationNode
66- SliceNode
67- ReshapeNode
68- TopKNode
69- GatherNode
65+ LocalResponseNormalizationNode
66+ SliceNode
67+ ReshapeNode
68+ TopKNode
69+ GatherNode
7070MaxPoolNode
7171```
7272
@@ -131,9 +131,9 @@ By default, target quantization precision is int8. However, precision can be
131131controlled via command line parameter: ` quantization-precision ` . There are
132132two supported values: ` Int8 ` and ` Int16 ` .
133133
134- ## Caffe2 Quantized Model Support
134+ ## Caffe2 Quantized Model Support
135135
136- Glow is able to support Caffe2 Resnet50 quantized model:
136+ Glow is able to support Caffe2 Resnet50 quantized model:
137137https://github.com/caffe2/models/tree/master/resnet50_quantized
138138
139139To support Caffe2 quantized models, Glow has:
@@ -152,16 +152,16 @@ Int8GivenTensorFill
152152```
153153- Supported int32 quantized bias.
154154
155- In most of the cases, bias is quantized in int32 to improve precision
156- (the partial sum of the matrix-matrix multiplication is accumulated into int32,
157- so int32 bias can be added to the int32 partial sum for better accuracy).
158- Glow now supports int32 quantized bias in ``` Convolution ``` , ``` FullyConnected ```
155+ In most of the cases, bias is quantized in int32 to improve precision
156+ (the partial sum of the matrix-matrix multiplication is accumulated into int32,
157+ so int32 bias can be added to the int32 partial sum for better accuracy).
158+ Glow now supports int32 quantized bias in ``` Convolution ``` , ``` FullyConnected ```
159159and ``` RowwiseQuantizedFullyConnected ``` nodes.
160160
161161- Supported the conversion from uint8 quantized activations to int8 quantized activations.
162162
163- For the quantized Caffe2 ops, the activations are quantized to uint8. In Glow, the
164- activations are quantized to int_8. Therefore, for the offset read from quantized Caffe2
163+ For the quantized Caffe2 ops, the activations are quantized to uint8. In Glow, the
164+ activations are quantized to int_8. Therefore, for the offset read from quantized Caffe2
165165model, we need to subtract 128(i.e. INT8_MIN) to make the activations become int8.
166166
167167## Compiler Optimizations
@@ -191,17 +191,24 @@ For more specific graph optimizations check [here](Optimizations.md#quantization
191191
192192## Row-wise Quantization
193193
194- Row-wise (or channel-wise) quantization is an important way to minimize accuracy drop.
195- Glow supports row-wise quantized FullyConnected node ``` RowwiseQuantizedFullyConnected ```
196- which is enabled by an image-classifier/loader option "-enable-rowwise".
194+ Row-wise (or channel-wise) quantization is an important way to minimize accuracy
195+ drop. Glow supports row-wise quantized FullyConnected node
196+ ``` RowwiseQuantizedFullyConnected ``` which is enabled by an
197+ image-classifier/loader option "-enable-rowwise".
197198
198- For the regular quantized FC, we quantize the whole weights tensor with the same
199- scale and offset, which are computed based on the max and min of the entire tensor).
200- But for row-wise, after getting ``` min_i ``` and ``` max_i ``` for each row ``` i ``` , we compute the pair
201- of ``` (scale_i, offset_i) ``` to quantize each element in row ``` i ``` . The figure below shows
202- the quantized FC node and RowwiseQuantizedFullyConnected node. Instead of using only
203- one tensor to represent the quantized weights, we need 2 extra vectors ``` Scales ```
204- and ``` Offsets ``` to store the ``` (scale, offset) ``` for each row.
199+ For the regular quantized FC, we quantize the whole weights tensor with the same
200+ scale and offset, which are computed based on the max and min of the entire
201+ tensor). But for row-wise, after getting ``` min_i ``` and ``` max_i ``` for each
202+ row ``` i ``` , we compute the pair of ``` (scale_i, offset_i) ``` to quantize each
203+ element in row ``` i ``` . The figure below shows the quantized FC node and
204+ RowwiseQuantizedFullyConnected node. Instead of using only one tensor to
205+ represent the quantized weights, we need 2 extra vectors ``` Scales ``` and
206+ ``` Offsets ``` to store the ``` (scale, offset) ``` for each row.
205207
206208
207209![ ] ( rowwise_quantized_fc.png )
210+
211+ Row-wise quantized SparseLengthsWeightedSum is also supported. Similar to the
212+ above, we compute scales and offsets per row, to be used with the ` Data ` input
213+ for the ` RowwiseQuantizedSparseLengthsSumNode ` . Scales and Offsets are inputs to
214+ the node. Output of this node is float, matching the Caffe2 implementation.
0 commit comments