V3 Documentation¶

[TOC]

V3 Architecture¶

MRT V3 is the functional entry point of MRT taht supports customized user-defined attributes for different quantization stages. As listed, there are three required postional stages , which need to be executed sequentially. V3 architecture can be shown as:

Preparation: initializing the pre-trained model, supported graph-level passes including:
- standardizing the specified model
  - duplicate symbol name hacking and validation
  - parameter prefix removal
  - constant operator deduction
  - input shape attachment
  - input name replacement
  - multiple outputs fusion
  - parameter unification
- equivalent operator substitution for the availability of quantization and cvm compliation
  - tranpose operator fusion
  - operator rewriting
- model splitting into top and base to be precocessed respectively
Calibration: calibrate the prepared model to acquire the operator-level thresholds that is futher exploited in the stage of quantization
Quantization: perform the operator-level quantizing procedure that inludes:
- operator restoration for tuning purposes
- operator quantization
  - quantizing with respect to the acquried thresholds and input precisions
  - operator clipping on condition that the output precision exceeds the tight precision
  - model output operator quantization with respect to the pre-defined output precision
- graph-level model merging
Evaluation: multiple model validating with specified metric definition
- pre-compilation stage graph-level reduction
  - input shape attaching
  - operator attribute revising for the stage of compilation
  - constant operator deduction
- precision comparison of quantized and unquantized models with provided metric function
Compilation: compile mxnet model into cvm accepted json&bin format
- pre-compilation stage graph-level reduction
  - input shape attaching
  - operator attribute revising for the stage of compilation
  - constant operator deduction
- CVM graph compilation - op shape inference - graph compilation
- CVM parameter precision reduction
- CVM deployed graph and parameter complilation

Benchmark Quantization Results¶

The comparison between the original float model and quantized model is listed as below.

Top 1 Accuracy:

Model Name	Original Float Model	MRT V3 Quantized Model
resnet_v1	77.39%	76.46%
resnet_v2	77.15%	74.16%
resnet18_v1	70.96%	70.11%
resnet18_v1b_0.89	67.21%	63.79%
quickdraw	81.66%	81.57%
qd10_resnetv1_20	85.72%	85.73%
densenet161	77.62%	77.25%
alenxet	55.91%	51.54%
cifar_resnet20_v1	92.88%	92.82%
mobilenet1_0	70.77%	66.11%
mobilenetv2_1.0	71.51%	69.39%
shufflenet_v1	63.48%	60.45%
squeezenet1.0	57.20%	54.92%
tf_inception_v3	45.16%	49.62%
vgg19	74.13%	73.29%
mnist	99.00%	98.96%

Top 5 Accuracy:

Model Name	Original Float Model	MRT V3 Quantized Model
resnet_v1	93.59%	93.29%
resnet_v2	93.44%	91.74%
resnet18_v1	89.93%	89.62%
resnet18_v1b_0.89	87.45%	85.62%
quickdraw	98.22%	98.20%
qd10_resnetv1_20	98.71%	98.70%
densenet161	93.82%	93.60%
alenxet	78.75%	77.40%
cifar_resnet20_v1	99.78%	99.75%
mobilenet1_0	89.97%	87.35%
mobilenetv2_1.0	90.10%	89.30%
shufflenet_v1	85.12%	82.95%
squeezenet1.0	80.04%	78.64%
tf_inception_v3	67.93%	74.71%
vgg19	91.77%	91.52%
mnist	100.00%	100.00%

Accuracy:

Model Name	Original Float Model	MRT V3 Quantized Model
trec	98.19%	97.99%
yolo3_darknet53_voc	81.51%	81.51%
yolo3_mobilenet1.0_voc	76.03%	71.56%
ssd_512_resnet50_v1_voc	80.30%	80.05%
ssd_512_mobilenet1.0_voc	75.58%	71.32%

For most recent model quantization results, please refer to MRT Quantization Results.