MRT Generalized Quantization API ¶

Contents

MRT Generalized Quantization API

Quantizer API ¶

Optimizor definition for MRT calibration. Quantizer definition for MRT calibration and quantization. Feature Types Definition for MRT calibration and quantization. Buffers Types Definition for MRT quantization. Granularity constant vars definition.

class mrt.V2.tfm_types.Feature(*args)¶

The data structure which specifies the object of data sampling in calibration stage.

Feature can be manipulated in quantization stage.

get()¶

Get the value of feature.

Returns: ret – The feature value.
Return type: float or tuple

get_threshold()¶: Get the threshold of feature

serialize()¶

Serialize the feature into list to be compatible with json.

Returns: ret – list of serialized features.
Return type: list

class mrt.V2.tfm_types.AFeature(*args)¶: AFeature is designed for uniform symmetric quantization. absmax stands for the max of the absolute value of every entry in the input tensor.

class mrt.V2.tfm_types.MMFeature(*args)¶: MMFeature is designed for unifrom affine quantization. minv and maxv respectively stand for the min and max entries of the input tensor.

class mrt.V2.tfm_types.Buffer(*args)¶

Quantization buffer used to store the scale. For uniform affine quantizers, the zero point is also stored.

get()¶

Get the value of buffer.

Returns: ret – The buffer value.
Return type: float or tuple

serialize()¶

Serialize the buffer into list to be compatible with json.

Returns: ret – list of serialized buffers.
Return type: list

class mrt.V2.tfm_types.SBuffer(*args)¶: SBuffer is designed for uniform symmetric quantizers, where scale is stored.

class mrt.V2.tfm_types.SZBuffer(*args)¶: SZBuffer is designed for uniform affine quantizers, where both scale and zero point is stored.

class mrt.V2.tfm_types.Quantizer¶

Helper class to execute quantization process.

Current quantizer types supported by MRT GEN:

Uniform Symmetric Quantization
Uniform Affine Quantization

get_prec(val)¶

Get the quantizer precision with respect to the given value.

For quantizers like uniform symmetric quantizers, the returned precision should be ‘int’,

For quantizers like uniform affine quantizers, the returned precision should be ‘uint’.

Parameters: val (float) – The quantize precision of the node.
Returns: ret – The quantizer precision.
Return type: int

get_range(prec)¶

Get the quantizer range of with respect to the given precision.

Parameters: prec (int) – The specified precision.
Returns: ret – The minimal and maximal possible value.
Return type: tuple

get_scale(oprec, ft)¶

Get the quantizer scale.

Parameters

oprec (int) – The quantize precision of the node.
ft (mrt.V2.Feature) – The feature of the node to be quantized.

Returns

ret – The quantizer scale.

Return type

float

int_realize(data, prec, **kwargs)¶

Realize the given input with respect to the given precision bound.

Parameters

data (mxnet.NDArray) – The float weight to be realized.
prec (int) – The output precision bound.

Returns

ret – The realized result and the tight precision.

Return type

tuple

quantize(sym, oprec, oscale=None, **kwargs)¶

The interface where operator quantization is perfomed.

Parameters

sym (mxnet.symbol) – The expansion symbol or float weight symbol to be quantized.
oprec (int) – The output precision of the quantized symbol.
oscale (flaot or NoneType) – The output scale of the quantized symbol. If it’s not NoneType, the expansion operator will be quantized by output scale. Otherwise, it will be quantized by output precision.

Returns

ret – Respectively output quantized symbol, output precision, output scale. For quantizers like uniform affine quantizer, zero point is also returned.

Return type

tuple

sample(data, **kwargs)¶

Create the feature with repect to the feature type.

Parameters: data (mxnet.NDArray) – The input data feature.
Returns: ret – The created feature.
Return type: Feature

class mrt.V2.tfm_types.USQuantizer¶: Uniform symmetric quantizer

class mrt.V2.tfm_types.UAQuantizer¶: Uniform affine quantizer

class mrt.V2.tfm_types.Optimizor(**attrs)¶

Currently supported optimizor types intended for sampling optimization:

historical value
moving average
kl divergence

Optimizor types to be implemented:

outlier removal

Notice:

The users can implement customized optimizors with respect to the features. e.g. Designing different optimizors for different components of the feature.

get_opt(raw_ft, out, **kwargs)¶

Get the optimized value of the calibrated feature.

Parameters

raw_ft (float) – The calibrated feature.
out (mxnet.NDArray) – The original data from which the raw_ft is calibrated.

Returns

ret – The optimized feature.

Return type

float

static list_supported_quant_types()¶: List the supported quantizer types.

class mrt.V2.tfm_types.HVOptimizor(**attrs)¶: Generalized historical value optimizor

class mrt.V2.tfm_types.MAOptimizor(**attrs)¶: Generalized moving average optimizor

class mrt.V2.tfm_types.KLDOptimizor(**attrs)¶: KL divergence optimizor for AFeature

Graph API ¶

Collection of MRT GEN pass tions. Stage-level symbol pass designation for MRT. Compatible with MRT architecture.

mrt.V2.tfm_pass.sym_config_infos(symbol, params, cfg_dict={}, logger=<module 'logging' from '/home/docs/.pyenv/versions/3.7.9/lib/python3.7/logging/__init__.py'>)¶

Customized graph-level topo pass definition.

Interface for MRT main2 configuration Create customized samplers and optimizors.

Use it just before calibration.

mrt.V2.tfm_pass.deserialize(cfg_groups)¶

Interface for MRT main2 configuration

Check the validity and compatibility of feature, sampler and optimizor configurations.

Parameters: cfg_groups (dict) – configuration information (quantizer type, optimizor information) maps to node names (before calibration).

mrt.V2.tfm_pass.sym_calibrate(symbol, params, data, cfg_dict, **kwargs)¶: Customized graph-level topo pass definition. Interface for MRT GEN Calibration.

mrt.V2.tfm_pass.sym_separate_pad(symbol, params)¶: Separate pad attribute as an independent symbol in rewrite stage.

mrt.V2.tfm_pass.sym_separate_bias(symbol, params)¶: Separate bias attribute as an independent symbol in rewrite stage.

mrt.V2.tfm_pass.sym_slice_channel(symbol, params, cfg_dict={})¶

Customized graph-level topo pass definition.

Interface for granularity control. While layer-wise feature is by default, MRT support channel-wise features specified in cfg_dict.

mrt.V2.tfm_pass.quantize(symbol, params, features, precs, buffers, cfg_dict, op_input_precs, restore_names, shift_bits, softmax_lambd)¶

Customized graph-level topo pass definition. Interface for MRT GEN Quantization.

Parameters

symbol (mxnet.symbol) – the grouped output symbol represent the graph to be quantized.
params (dict) – symbol name maps to mxnet.NDArray, represent graph parameters
features (dict) – symbol name maps to mrt.V2.Feature
precs (dict) – symbol name maps to precision dict
buffers (dict) – symbol name maps to mrt.V2.Buffer
cfg_dict (dict) – symbol name maps to configuration dict
op_input_precs (dict) – symbol name maps to input precision
restore_names (set) – set of symbol names representing symbols to be restored
shift_bits (int) – hyperparameter for quantize precision control
softmax_lambd (float) – hyperparameter for feature optimization

Transformer API ¶

Customized Symbolic Pass Interfaces. Base passes with default operation settings for MRT GEN. Collection of transformer management functions.

class mrt.V2.tfm_base.Transformer¶

Generalized transformer which provide default slice_channel and quantize interface for specific ops. Other default transformer interface like fuse_transpose is inherited.

quantize(op, **kwargs)¶: Generalized version of quantization for quantization.

slice_channel(op, **kwargs)¶

Operators will be split into multiple channels for intended for quantization of channel-wise granularity.

Do nothing by default.

MRT Generalized Quantization API¶

Quantizer API¶

Graph API¶

Transformer API¶

MRT Generalized Quantization API ¶

Quantizer API ¶

Graph API ¶

Transformer API ¶