V3 Documentation¶
[TOC]
V3 Architecture¶
MRT V3 is the functional entry point of MRT taht supports customized user-defined attributes for different quantization stages. As listed, there are three required postional stages , which need to be executed sequentially. V3 architecture can be shown as:
Preparation: initializing the pre-trained model, supported graph-level passes including:
standardizing the specified model
duplicate symbol name hacking and validation
parameter prefix removal
constant operator deduction
input shape attachment
input name replacement
multiple outputs fusion
parameter unification
equivalent operator substitution for the availability of quantization and cvm compliation
tranpose operator fusion
operator rewriting
model splitting into top and base to be precocessed respectively
Calibration: calibrate the prepared model to acquire the operator-level thresholds that is futher exploited in the stage of quantization
Quantization: perform the operator-level quantizing procedure that inludes:
operator restoration for tuning purposes
operator quantization
quantizing with respect to the acquried thresholds and input precisions
operator clipping on condition that the output precision exceeds the tight precision
model output operator quantization with respect to the pre-defined output precision
graph-level model merging
Evaluation: multiple model validating with specified metric definition
pre-compilation stage graph-level reduction
input shape attaching
operator attribute revising for the stage of compilation
constant operator deduction
precision comparison of quantized and unquantized models with provided metric function
Compilation: compile mxnet model into cvm accepted json&bin format
pre-compilation stage graph-level reduction
input shape attaching
operator attribute revising for the stage of compilation
constant operator deduction
CVM graph compilation - op shape inference - graph compilation
CVM parameter precision reduction
CVM deployed graph and parameter complilation
Benchmark Quantization Results¶
The comparison between the original float model and quantized model is listed as below.
Top 1 Accuracy:
Model Name |
Original Float Model |
MRT V3 Quantized Model |
|---|---|---|
resnet_v1 |
77.39% |
76.46% |
resnet_v2 |
77.15% |
74.16% |
resnet18_v1 |
70.96% |
70.11% |
resnet18_v1b_0.89 |
67.21% |
63.79% |
quickdraw |
81.66% |
81.57% |
qd10_resnetv1_20 |
85.72% |
85.73% |
densenet161 |
77.62% |
77.25% |
alenxet |
55.91% |
51.54% |
cifar_resnet20_v1 |
92.88% |
92.82% |
mobilenet1_0 |
70.77% |
66.11% |
mobilenetv2_1.0 |
71.51% |
69.39% |
shufflenet_v1 |
63.48% |
60.45% |
squeezenet1.0 |
57.20% |
54.92% |
tf_inception_v3 |
45.16% |
49.62% |
vgg19 |
74.13% |
73.29% |
mnist |
99.00% |
98.96% |
Top 5 Accuracy:
Model Name |
Original Float Model |
MRT V3 Quantized Model |
|---|---|---|
resnet_v1 |
93.59% |
93.29% |
resnet_v2 |
93.44% |
91.74% |
resnet18_v1 |
89.93% |
89.62% |
resnet18_v1b_0.89 |
87.45% |
85.62% |
quickdraw |
98.22% |
98.20% |
qd10_resnetv1_20 |
98.71% |
98.70% |
densenet161 |
93.82% |
93.60% |
alenxet |
78.75% |
77.40% |
cifar_resnet20_v1 |
99.78% |
99.75% |
mobilenet1_0 |
89.97% |
87.35% |
mobilenetv2_1.0 |
90.10% |
89.30% |
shufflenet_v1 |
85.12% |
82.95% |
squeezenet1.0 |
80.04% |
78.64% |
tf_inception_v3 |
67.93% |
74.71% |
vgg19 |
91.77% |
91.52% |
mnist |
100.00% |
100.00% |
Accuracy:
Model Name |
Original Float Model |
MRT V3 Quantized Model |
|---|---|---|
trec |
98.19% |
97.99% |
yolo3_darknet53_voc |
81.51% |
81.51% |
yolo3_mobilenet1.0_voc |
76.03% |
71.56% |
ssd_512_resnet50_v1_voc |
80.30% |
80.05% |
ssd_512_mobilenet1.0_voc |
75.58% |
71.32% |
For most recent model quantization results, please refer to MRT Quantization Results.
