Implement RowwiseQuantizedSparseLengthsWeightedSum (#2282)

jfix71 · web-flow · commit da381d5ec4e4 · 2019-01-22T09:37:54.000-08:00
*Description*: Implement RowwiseQuantizedSparseLengthsWeightedSum. Added support for both Interpreter and CPU backends. *Testing*: Added unit tests for both RowwiseQuantizedSparseLengthsWeightedSum and RowwiseQuantizedSparseLengthsSum (implemented as the first but with weights as a float splat of 1.0). *Documentation*: Added to Quantization.md Related to #1698
diff --git a/docs/Quantization.md b/docs/Quantization.md
@@ -57,16 +57,16 @@ inference. Then, we recompile the network using this profile information to
 convert the network into a quantized form, allowing for static optimization of
 the quantized graph. We convert portions of the network into islands of integer
 computation and aim to generate outputs in the range that the original
-floating-point network produces. During the conversion, for the following types 
-of quantized nodes, we ignore the output's quantization params (if they are 
-provided) and force the output have the same quantization params as the input 
+floating-point network produces. During the conversion, for the following types
+of quantized nodes, we ignore the output's quantization params (if they are
+provided) and force the output have the same quantization params as the input
 for performance purpose:
 ```
-LocalResponseNormalizationNode                       
-SliceNode                                      
-ReshapeNode                                       
-TopKNode                                        
-GatherNode                                         
+LocalResponseNormalizationNode
+SliceNode
+ReshapeNode
+TopKNode
+GatherNode
 MaxPoolNode
 ```
 
@@ -131,9 +131,9 @@ By default, target quantization precision is int8. However, precision can be
 controlled via command line parameter: `quantization-precision`. There are
 two supported values: `Int8` and `Int16`.
 
-## Caffe2 Quantized Model Support 
+## Caffe2 Quantized Model Support
 
-Glow is able to support Caffe2 Resnet50 quantized model: 
+Glow is able to support Caffe2 Resnet50 quantized model:
 https://github.com/caffe2/models/tree/master/resnet50_quantized
 
 To support Caffe2 quantized models, Glow has:
@@ -152,16 +152,16 @@ Int8GivenTensorFill
 ```
 - Supported int32 quantized bias.
 
-In most of the cases, bias is quantized in int32 to improve precision 
-(the partial sum of the matrix-matrix multiplication is accumulated into int32, 
-so int32 bias can be added to the int32 partial sum for better accuracy). 
-Glow now supports int32 quantized bias in ```Convolution```, ```FullyConnected``` 
+In most of the cases, bias is quantized in int32 to improve precision
+(the partial sum of the matrix-matrix multiplication is accumulated into int32,
+so int32 bias can be added to the int32 partial sum for better accuracy).
+Glow now supports int32 quantized bias in ```Convolution```, ```FullyConnected```
 and ```RowwiseQuantizedFullyConnected``` nodes.
 
 - Supported the conversion from uint8 quantized activations to int8 quantized activations.
 
-For the quantized Caffe2 ops, the activations are quantized to uint8. In Glow, the 
-activations are quantized to int_8. Therefore, for the offset read from quantized Caffe2 
+For the quantized Caffe2 ops, the activations are quantized to uint8. In Glow, the
+activations are quantized to int_8. Therefore, for the offset read from quantized Caffe2
 model, we need to subtract 128(i.e. INT8_MIN) to make the activations become int8.
 
 ## Compiler Optimizations
@@ -191,17 +191,24 @@ For more specific graph optimizations check [here](Optimizations.md#quantization
 
 ## Row-wise Quantization
 
-Row-wise (or channel-wise) quantization is an important way to minimize accuracy drop.  
-Glow supports row-wise quantized FullyConnected node ```RowwiseQuantizedFullyConnected```
-which is enabled by an image-classifier/loader option "-enable-rowwise".
+Row-wise (or channel-wise) quantization is an important way to minimize accuracy
+drop.  Glow supports row-wise quantized FullyConnected node
+```RowwiseQuantizedFullyConnected``` which is enabled by an
+image-classifier/loader option "-enable-rowwise".
 
-For the regular quantized FC, we quantize the whole weights tensor with the same 
-scale and offset, which are computed based on the max and min of the entire tensor). 
-But for row-wise, after getting ```min_i``` and ```max_i``` for each row ```i```, we compute the pair 
-of ```(scale_i, offset_i)``` to quantize each element in row ```i```. The figure below shows 
-the quantized FC node and RowwiseQuantizedFullyConnected node. Instead of using only 
-one tensor to represent the quantized weights, we need 2 extra vectors ```Scales``` 
-and ```Offsets``` to store the ```(scale, offset)``` for each row.
+For the regular quantized FC, we quantize the whole weights tensor with the same
+scale and offset, which are computed based on the max and min of the entire
+tensor).  But for row-wise, after getting ```min_i``` and ```max_i``` for each
+row ```i```, we compute the pair of ```(scale_i, offset_i)``` to quantize each
+element in row ```i```. The figure below shows the quantized FC node and
+RowwiseQuantizedFullyConnected node. Instead of using only one tensor to
+represent the quantized weights, we need 2 extra vectors ```Scales``` and
+```Offsets``` to store the ```(scale, offset)``` for each row.
 
 
 ![](rowwise_quantized_fc.png)
+
+Row-wise quantized SparseLengthsWeightedSum is also supported. Similar to the
+above, we compute scales and offsets per row, to be used with the `Data` input
+for the `RowwiseQuantizedSparseLengthsSumNode`. Scales and Offsets are inputs to
+the node. Output of this node is float, matching the Caffe2 implementation.
diff --git a/include/glow/Graph/Graph.h b/include/glow/Graph/Graph.h
@@ -565,6 +565,25 @@ class Function final : public Named {
                                  NodeValue data, NodeValue weights,
                                  NodeValue indices, NodeValue lengths);
 
+  /// Create a node, performing SparseLengthsSum operation, using rowwise
+  /// quantization for the input data. Gathers slices of the outer-most
+  /// dimension of Data indexed by Indices vector, and then accumulates them
+  /// into len(Lengths) entries: first Lengths[0] slices are aggregated to
+  /// Result[0], next Lengths[1] slices are aggregated to Result[1],
+  /// etc. I.e. sum(Lengths) must be equal to len(Indices).
+  RowwiseQuantizedSparseLengthsWeightedSumNode *
+  createRowwiseQuantizedSparseLengthsSum(llvm::StringRef name, Tensor &data,
+                                         NodeValue indices, NodeValue lengths);
+
+  /// Same as \ref createRowwiseQuantizedSparseLengthsSum(), but i-th slice is
+  /// multiplied by weights[i]. len(weights) must be equal to len(indices).
+  RowwiseQuantizedSparseLengthsWeightedSumNode *
+  createRowwiseQuantizedSparseLengthsWeightedSum(llvm::StringRef name,
+                                                 Tensor &data,
+                                                 NodeValue weights,
+                                                 NodeValue indices,
+                                                 NodeValue lengths);
+
   /// Given a vector of segment lengths, calculates offsets of each segment and
   /// packs them next to the lengths. For the input vector of length N the
   /// output is a Nx2 matrix with (offset, lengths) packaged for each segment.
diff --git a/include/glow/Quantization/Base/Base.h b/include/glow/Quantization/Base/Base.h
@@ -128,10 +128,11 @@ chooseQuantizationParams(float min, float max, Schema schema = Asymmetric,
 std::vector<int8_t> createMapping(TypeRef inTy, TypeRef outTy,
                                   std::function<float(float)> f);
 
-/// Row-wise quantize the tensor \p input. The param \p input is a 2D
-/// tensor (i.e. M * N), \p scales and \p offsets are generated by each row of
-/// \p input, \p output is 2D tensor quantized from \p input using \p scales
-/// and \p offsets for each row.
+/// Row-wise quantize the tensor \p input. \p scales and \p offsets are
+/// generated by each row of \p input, \p output is tensor of the same shape as
+/// input, quantized from \p input using \p scales and \p offsets for each
+/// row. Note that the shape of input/output can be any non-zero number of
+/// dimensions; row refers to all data in the first dimension of the shape.
 void tensorRowwiseQuantization(const Tensor &input, Tensor &output,
                                Tensor &scales, Tensor &offsets);
 
diff --git a/lib/Backends/CPU/LLVMIRGen.cpp b/lib/Backends/CPU/LLVMIRGen.cpp
@@ -2292,6 +2292,32 @@ void LLVMIRGen::generateLLVMIRForInstr(llvm::IRBuilder<> &builder,
     break;
   }
 
+  case Kinded::Kind::RowwiseQuantizedSparseLengthsWeightedSumInstKind: {
+    auto *N = cast<RowwiseQuantizedSparseLengthsWeightedSumInst>(I);
+    auto *dest = N->getDest();
+    auto *data = N->getData();
+    auto *scales = N->getScales();
+    auto *offsets = N->getOffsets();
+    auto *weights = N->getWeights();
+    auto *indices = N->getIndices();
+    auto *lengths = N->getLengths();
+    auto *destPtr = emitValueAddress(builder, dest);
+    auto *dataPtr = emitValueAddress(builder, data);
+    auto *scalesPtr = emitValueAddress(builder, scales);
+    auto *offsetsPtr = emitValueAddress(builder, offsets);
+    auto *weightsPtr = emitValueAddress(builder, weights);
+    auto *indicesPtr = emitValueAddress(builder, indices);
+    auto *lengthsPtr = emitValueAddress(builder, lengths);
+    auto *segments = emitConstSizeT(builder, lengths->dims()[0]);
+    auto *lineSize = emitConstSizeT(builder, data->size() / data->dims()[0]);
+    auto *F = getFunction("rowwise_quantized_sparse_lengths_weighted_sum",
+                          dest->getElementType());
+    createCall(builder, F,
+               {destPtr, dataPtr, scalesPtr, offsetsPtr, weightsPtr, indicesPtr,
+                lengthsPtr, segments, lineSize});
+    break;
+  }
+
   case Kinded::Kind::SparseToDenseInstKind: {
     auto *STDI = llvm::cast<SparseToDenseInst>(I);
     auto *indices = STDI->getIndices();
diff --git a/lib/Backends/CPU/libjit/libjit.cpp b/lib/Backends/CPU/libjit/libjit.cpp
@@ -1037,6 +1037,26 @@ void libjit_sparse_lengths_weighted_sum_f(float *dest, float *data,
   }
 }
 
+void libjit_rowwise_quantized_sparse_lengths_weighted_sum_f(
+    float *dest, int8_t *data, float *scales, int32_t *offsets, float *weights,
+    size_t *indices, int32_t *lengths, size_t segments, size_t lineSize) {
+  memset(dest, 0, segments * lineSize * sizeof(float));
+  size_t curIndex = 0;
+  for (size_t i = 0; i < segments; i++) {
+    for (int32_t j = 0; j < lengths[i]; j++) {
+      float weight = weights[curIndex];
+      size_t line = indices[curIndex];
+      const float scale = scales[line];
+      const int32_t offset = offsets[line];
+      for (size_t k = 0; k < lineSize; k++) {
+        const float fData = scale * (data[line * lineSize + k] - offset);
+        dest[i * lineSize + k] += weight * fData;
+      }
+      curIndex++;
+    }
+  }
+}
+
 void libjit_sparse_to_dense_f(float *dest, const size_t *indices,
                               const float *values, size_t numIndices,
                               size_t destSize, size_t valueSize) {
diff --git a/lib/Backends/Interpreter/InterpreterNodes.cpp b/lib/Backends/Interpreter/InterpreterNodes.cpp
@@ -2097,6 +2097,55 @@ void InterpreterFunction::fwdSparseLengthsWeightedSumInst(
                             I->getData()->getElementType(), I);
 }
 
+void InterpreterFunction::fwdRowwiseQuantizedSparseLengthsWeightedSumInst(
+    const RowwiseQuantizedSparseLengthsWeightedSumInst *I) {
+  auto *out = getTensor(I->getDest());
+  auto *data = getTensor(I->getData());
+  auto *dataScales = getTensor(I->getScales());
+  auto *dataOffsets = getTensor(I->getOffsets());
+  auto *weights = getTensor(I->getWeights());
+  auto *indices = getTensor(I->getIndices());
+  auto *lengths = getTensor(I->getLengths());
+
+  out->zero();
+
+  auto IH = indices->getHandle<int64_t>();
+  auto LH = lengths->getHandle<int32_t>();
+
+  size_t segments = lengths->dims()[0];
+  size_t totalLength = 0;
+  for (size_t i = 0; i < segments; i++) {
+    totalLength += LH.raw(i);
+  }
+  assert(totalLength == indices->dims()[0] &&
+         "sum(Lengths) must be equal to len(Indices)");
+
+  size_t lineSize = data->size() / data->dims()[0];
+
+  auto DH = data->getHandle<int8_t>();
+  auto DSH = dataScales->getHandle<float>();
+  auto DOH = dataOffsets->getHandle<int32_t>();
+  auto WH = weights->getHandle<float>();
+  auto OH = out->getHandle<float>();
+
+  size_t curIdx = 0;
+  for (size_t i = 0; i < segments; i++) {
+    for (size_t j = 0, e = LH.raw(i); j < e; j++) {
+      const float weight = WH.raw(curIdx);
+      const size_t rowIdx = IH.raw(curIdx++);
+      const float scale = DSH.at({rowIdx});
+      const int32_t offset = DOH.at({rowIdx});
+      size_t offsetIn = rowIdx * lineSize;
+      size_t offsetOut = i * lineSize;
+      for (size_t k = 0; k < lineSize; k++) {
+        float d = quantization::dequantize(
+            DH.raw(offsetIn++), TensorQuantizationParams{scale, offset});
+        OH.raw(offsetOut++) += d * weight;
+      }
+    }
+  }
+}
+
 void InterpreterFunction::fwdLengthsToRangesInst(const LengthsToRangesInst *I) {
   auto ranges = getTensor(I->getDest())->getHandle<int32_t>();
   auto lengths = getTensor(I->getLengths())->getHandle<int32_t>();
diff --git a/lib/Graph/Graph.cpp b/lib/Graph/Graph.cpp
@@ -1369,6 +1369,60 @@ Function::createSparseLengthsWeightedSum(llvm::StringRef name, TypeRef outTy,
                                                   indices, lengths));
 }
 
+/// Helper to create a RowwiseQuantizedSparseLengthsWeightedSumNode in the
+/// Function \p F with \p name, using \ data, \p weights, \p indices, and \p
+/// lengths as inputs. The provided float data in \p Tensor is rowwise
+/// quantized, creating Constants for the rowwise quantized data as well as
+/// Scales and Offsets, in the Module containing \p F.
+static RowwiseQuantizedSparseLengthsWeightedSumNode *
+quantizeDataAndCreateRowwiseQuantizedSparseLengthsWeightedSum(
+    Function *F, llvm::StringRef name, Tensor &data, NodeValue weights,
+    NodeValue indices, NodeValue lengths) {
+  auto inDims = data.dims();
+  ShapeVector outDims(inDims.begin(), inDims.end());
+  outDims[0] = lengths.dims()[0];
+  auto outTy = F->getParent()->uniqueType(ElemKind::FloatTy, outDims);
+
+  // Note: In rwqData, we are using a quantized type, however the scale/offset
+  // are set to dummy values 0.0/0. This is because the actually used
+  // scale/offset come from dataScales and dataOffsets.
+  Constant *rwqData =
+      F->getParent()->createConstant(ElemKind::Int8QTy, inDims, 0.0, 0, "data");
+  Constant *dataScales = F->getParent()->createConstant(
+      ElemKind::FloatTy, {inDims[0]}, "dataScales");
+  Constant *dataOffsets = F->getParent()->createConstant(
+      ElemKind::Int32ITy, {inDims[0]}, "dataOffsets");
+
+  quantization::tensorRowwiseQuantization(data, rwqData->getPayload(),
+                                          dataScales->getPayload(),
+                                          dataOffsets->getPayload());
+
+  return F->addNode(new RowwiseQuantizedSparseLengthsWeightedSumNode(
+      name, outTy, rwqData, dataScales, dataOffsets, weights, indices,
+      lengths));
+}
+
+RowwiseQuantizedSparseLengthsWeightedSumNode *
+Function::createRowwiseQuantizedSparseLengthsWeightedSum(llvm::StringRef name,
+                                                         Tensor &data,
+                                                         NodeValue weights,
+                                                         NodeValue indices,
+                                                         NodeValue lengths) {
+  return quantizeDataAndCreateRowwiseQuantizedSparseLengthsWeightedSum(
+      this, name, data, weights, indices, lengths);
+}
+
+RowwiseQuantizedSparseLengthsWeightedSumNode *
+Function::createRowwiseQuantizedSparseLengthsSum(llvm::StringRef name,
+                                                 Tensor &data,
+                                                 NodeValue indices,
+                                                 NodeValue lengths) {
+  auto ty = getParent()->uniqueType(ElemKind::FloatTy, {indices.dims()[0]});
+  auto ones = createSplat(name.str() + ".ones", ty, 1.0);
+  return quantizeDataAndCreateRowwiseQuantizedSparseLengthsWeightedSum(
+      this, name, data, ones, indices, lengths);
+}
+
 LengthsToRangesNode *Function::createLengthsToRanges(llvm::StringRef name,
                                                      NodeValue lengths) {
   ShapeVector outDims({lengths.dims()[0], 2});
diff --git a/lib/Graph/Nodes.cpp b/lib/Graph/Nodes.cpp
@@ -717,6 +717,36 @@ bool SparseLengthsWeightedSumNode::verify() const {
   return isValid;
 }
 
+bool RowwiseQuantizedSparseLengthsWeightedSumNode::verify() const {
+  bool isValid = checkType(getResult(), ElemKind::FloatTy, this);
+  isValid &= checkType(getData(), ElemKind::Int8QTy, this);
+  isValid &= checkType(getScales(), ElemKind::FloatTy, this);
+  isValid &= checkType(getOffsets(), ElemKind::Int32ITy, this);
+  isValid &= checkType(getWeights(), ElemKind::FloatTy, this);
+  isValid &= checkType(getIndices(), ElemKind::Int64ITy, this);
+  isValid &= checkType(getLengths(), ElemKind::Int32ITy, this);
+  isValid &= expectCompareTrue("Indices must be a 1D vector",
+                               getIndices().dims().size(), size_t(1), this);
+  isValid &= expectCompareTrue("Lengths must be a 1D vector",
+                               getLengths().dims().size(), size_t(1), this);
+  isValid &= expectCompareTrue("Weights must be a 1D vector",
+                               getWeights().dims().size(), size_t(1), this);
+  isValid &= expectCompareTrue("Scales must be a 1D vector",
+                               getScales().dims().size(), size_t(1), this);
+  isValid &= expectCompareTrue("Offsets must be a 1D vector",
+                               getOffsets().dims().size(), size_t(1), this);
+  isValid &=
+      expectCompareTrue("Weights and Indices must have the same size",
+                        getWeights().dims()[0], getIndices().dims()[0], this);
+  isValid &= expectCompareTrue(
+      "Scales and Data must have the same first dimension size",
+      getData().dims()[0], getScales().dims()[0], this);
+  isValid &= expectCompareTrue(
+      "Offsets and Data must have the same first dimension size",
+      getData().dims()[0], getOffsets().dims()[0], this);
+  return isValid;
+}
+
 bool LengthsToRangesNode::verify() const {
   bool isValid = checkType(getResult(), getLengths().getElementType(), this);
   isValid &= checkType(getLengths(), ElemKind::Int32ITy, this);
diff --git a/lib/Quantization/Base/Base.cpp b/lib/Quantization/Base/Base.cpp
@@ -364,10 +364,13 @@ std::vector<int8_t> createMapping(TypeRef inTy, TypeRef outTy,
 
 void tensorRowwiseQuantization(const Tensor &input, Tensor &output,
                                Tensor &scales, Tensor &offsets) {
-  ShapeHW idim(input.dims());
+  const auto fDims = flattenCdr(input.dims());
+  Tensor finalIn = input.getUnowned({fDims.first, fDims.second});
+  Tensor finalOut = output.getUnowned({fDims.first, fDims.second});
+  ShapeHW idim(finalIn.dims());
 
-  auto srcH = input.getHandle<float>();
-  auto destH = output.getHandle<int8_t>();
+  auto srcH = finalIn.getHandle<float>();
+  auto destH = finalOut.getHandle<int8_t>();
   auto scalesH = scales.getHandle<float>();
   auto offsetsH = offsets.getHandle<int32_t>();
   for (size_t i = 0; i < idim.height; i++) {
diff --git a/tests/unittests/OperatorTest.cpp b/tests/unittests/OperatorTest.cpp
diff --git a/tools/ClassGen/InstrGen.cpp b/tools/ClassGen/InstrGen.cpp
diff --git a/tools/ClassGen/NodeGen.cpp b/tools/ClassGen/NodeGen.cpp