MRT Generalized Quantization API

Quantizer API

Optimizor definition for MRT calibration. Quantizer definition for MRT calibration and quantization. Feature Types Definition for MRT calibration and quantization. Buffers Types Definition for MRT quantization. Granularity constant vars definition.

class mrt.V2.tfm_types.Feature(*args)

The data structure which specifies the object of data sampling in calibration stage.

Feature can be manipulated in quantization stage.

get()

Get the value of feature.

Returns

ret – The feature value.

Return type

float or tuple

get_threshold()

Get the threshold of feature

serialize()

Serialize the feature into list to be compatible with json.

Returns

ret – list of serialized features.

Return type

list

class mrt.V2.tfm_types.AFeature(*args)

AFeature is designed for uniform symmetric quantization. absmax stands for the max of the absolute value of every entry in the input tensor.

class mrt.V2.tfm_types.MMFeature(*args)

MMFeature is designed for unifrom affine quantization. minv and maxv respectively stand for the min and max entries of the input tensor.

class mrt.V2.tfm_types.Buffer(*args)

Quantization buffer used to store the scale. For uniform affine quantizers, the zero point is also stored.

get()

Get the value of buffer.

Returns

ret – The buffer value.

Return type

float or tuple

serialize()

Serialize the buffer into list to be compatible with json.

Returns

ret – list of serialized buffers.

Return type

list

class mrt.V2.tfm_types.SBuffer(*args)

SBuffer is designed for uniform symmetric quantizers, where scale is stored.

class mrt.V2.tfm_types.SZBuffer(*args)

SZBuffer is designed for uniform affine quantizers, where both scale and zero point is stored.

class mrt.V2.tfm_types.Quantizer

Helper class to execute quantization process.

Current quantizer types supported by MRT GEN:
  1. Uniform Symmetric Quantization

  2. Uniform Affine Quantization

get_prec(val)

Get the quantizer precision with respect to the given value.

For quantizers like uniform symmetric quantizers, the returned precision should be ‘int’,

For quantizers like uniform affine quantizers, the returned precision should be ‘uint’.

Parameters

val (float) – The quantize precision of the node.

Returns

ret – The quantizer precision.

Return type

int

get_range(prec)

Get the quantizer range of with respect to the given precision.

Parameters

prec (int) – The specified precision.

Returns

ret – The minimal and maximal possible value.

Return type

tuple

get_scale(oprec, ft)

Get the quantizer scale.

Parameters
  • oprec (int) – The quantize precision of the node.

  • ft (mrt.V2.Feature) – The feature of the node to be quantized.

Returns

ret – The quantizer scale.

Return type

float

int_realize(data, prec, **kwargs)

Realize the given input with respect to the given precision bound.

Parameters
  • data (mxnet.NDArray) – The float weight to be realized.

  • prec (int) – The output precision bound.

Returns

ret – The realized result and the tight precision.

Return type

tuple

quantize(sym, oprec, oscale=None, **kwargs)

The interface where operator quantization is perfomed.

Parameters
  • sym (mxnet.symbol) – The expansion symbol or float weight symbol to be quantized.

  • oprec (int) – The output precision of the quantized symbol.

  • oscale (flaot or NoneType) – The output scale of the quantized symbol. If it’s not NoneType, the expansion operator will be quantized by output scale. Otherwise, it will be quantized by output precision.

Returns

ret – Respectively output quantized symbol, output precision, output scale. For quantizers like uniform affine quantizer, zero point is also returned.

Return type

tuple

sample(data, **kwargs)

Create the feature with repect to the feature type.

Parameters

data (mxnet.NDArray) – The input data feature.

Returns

ret – The created feature.

Return type

Feature

class mrt.V2.tfm_types.USQuantizer

Uniform symmetric quantizer

class mrt.V2.tfm_types.UAQuantizer

Uniform affine quantizer

class mrt.V2.tfm_types.Optimizor(**attrs)
Currently supported optimizor types intended for sampling optimization:
  1. historical value

  2. moving average

  3. kl divergence

Optimizor types to be implemented:
  1. outlier removal

Notice:

The users can implement customized optimizors with respect to the features. e.g. Designing different optimizors for different components of the feature.

get_opt(raw_ft, out, **kwargs)

Get the optimized value of the calibrated feature.

Parameters
  • raw_ft (float) – The calibrated feature.

  • out (mxnet.NDArray) – The original data from which the raw_ft is calibrated.

Returns

ret – The optimized feature.

Return type

float

static list_supported_quant_types()

List the supported quantizer types.

class mrt.V2.tfm_types.HVOptimizor(**attrs)

Generalized historical value optimizor

class mrt.V2.tfm_types.MAOptimizor(**attrs)

Generalized moving average optimizor

class mrt.V2.tfm_types.KLDOptimizor(**attrs)

KL divergence optimizor for AFeature

Graph API

Collection of MRT GEN pass tions. Stage-level symbol pass designation for MRT. Compatible with MRT architecture.

mrt.V2.tfm_pass.sym_config_infos(symbol, params, cfg_dict={}, logger=<module 'logging' from '/home/docs/.pyenv/versions/3.7.9/lib/python3.7/logging/__init__.py'>)

Customized graph-level topo pass definition.

Interface for MRT main2 configuration Create customized samplers and optimizors.

Use it just before calibration.

mrt.V2.tfm_pass.deserialize(cfg_groups)

Interface for MRT main2 configuration

Check the validity and compatibility of feature, sampler and optimizor configurations.

Parameters

cfg_groups (dict) – configuration information (quantizer type, optimizor information) maps to node names (before calibration).

mrt.V2.tfm_pass.sym_calibrate(symbol, params, data, cfg_dict, **kwargs)

Customized graph-level topo pass definition. Interface for MRT GEN Calibration.

mrt.V2.tfm_pass.sym_separate_pad(symbol, params)

Separate pad attribute as an independent symbol in rewrite stage.

mrt.V2.tfm_pass.sym_separate_bias(symbol, params)

Separate bias attribute as an independent symbol in rewrite stage.

mrt.V2.tfm_pass.sym_slice_channel(symbol, params, cfg_dict={})

Customized graph-level topo pass definition.

Interface for granularity control. While layer-wise feature is by default, MRT support channel-wise features specified in cfg_dict.

mrt.V2.tfm_pass.quantize(symbol, params, features, precs, buffers, cfg_dict, op_input_precs, restore_names, shift_bits, softmax_lambd)

Customized graph-level topo pass definition. Interface for MRT GEN Quantization.

Parameters
  • symbol (mxnet.symbol) – the grouped output symbol represent the graph to be quantized.

  • params (dict) – symbol name maps to mxnet.NDArray, represent graph parameters

  • features (dict) – symbol name maps to mrt.V2.Feature

  • precs (dict) – symbol name maps to precision dict

  • buffers (dict) – symbol name maps to mrt.V2.Buffer

  • cfg_dict (dict) – symbol name maps to configuration dict

  • op_input_precs (dict) – symbol name maps to input precision

  • restore_names (set) – set of symbol names representing symbols to be restored

  • shift_bits (int) – hyperparameter for quantize precision control

  • softmax_lambd (float) – hyperparameter for feature optimization

Transformer API

Customized Symbolic Pass Interfaces. Base passes with default operation settings for MRT GEN. Collection of transformer management functions.

class mrt.V2.tfm_base.Transformer

Generalized transformer which provide default slice_channel and quantize interface for specific ops. Other default transformer interface like fuse_transpose is inherited.

quantize(op, **kwargs)

Generalized version of quantization for quantization.

slice_channel(op, **kwargs)

Operators will be split into multiple channels for intended for quantization of channel-wise granularity.

Do nothing by default.