Operator Math Formalization¶
Note
Write this section document refer to the doc: Math Format please.
This will give a full exhaustive explanation to CVM-Runtime operators. The source code of FORMAL version has a strong correlation with this mathematical description, while other versions like CPU, CUDA, will only promise the consistent inference result, with arbitrary process logic.
Note
All numbers refered to by the symbol are integers by default.
All the operators’ formalization obeys the unified format:
which means that for given value range, the forluma in the first line is always true, subjecting to the constraints listed as the condition statements.
The quick operators reference is listed as below:
Reduce Operators¶
A reduce operator performs the reduction function to input data based on the parameters, and the process logic over all kind of operators are the same. Reduction is performed on the given axes, other dimensions remains the same and the result are stored in those places. We abstract the common reduce logic as formalization here and specify the reduce function for each operators respectively.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\)
Attribute:
axes
(TShape), which is \(M\)-length vector, where \(M \in [0, N+1)\)keepdims
(bool)exclude
(bool)
Case
exclude
is true and \(M = N\)
\[Y = X\]
Case
exclude
is false and \(M = 0\)
\[ \begin{align}\begin{aligned}Y[\underbrace{0, 0, \cdots, 0}_{K}] = \text{REDUCE_FUNC}(X),\\\begin{split}\text{where } K = \begin{cases} 1, & \text{if keepdims is false} \\ N, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]
Case
keepdims
is false
\[ \begin{align}\begin{aligned}Y[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-r-1)}] = \text{REDUCE_FUNC}(Z),\\\forall d_{I(i)} \in [0, n_{I(i)}) \wedge i \in [0, N-r),\\\begin{split}\text{where } I: [0, N-r) \to U-R, \text{s.t. } \forall i < j, I(i) < I(j) \text{ and} \\ J: [0, r) \to R, \text{s.t. } \forall i < j, J(i) < J(j) \text{ and} \\ Z = \{ X[d_0, d_1, \cdots, d_{N-1}] \mid d_i \in [0, n_i) \wedge i \in J \}\end{split}\end{aligned}\end{align} \]
Otherwise
\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = M[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-r-1)}], \\\end{split}\\\begin{split}\forall d_i \in [0, n_i) \wedge i \in U - R \wedge d_j = 0 \wedge j \in R, \\\end{split}\\\begin{split}\text{where } I: [0, N-r) \to U-R, \text{s.t. } \forall i < j, I(i) < I(j) \text{ and} \\ J: [0, r) \to R, \text{s.t. } \forall i < j, J(i) < J(j) \text{ and} \\ M = \text{OP_NAME}(X, \text{axes=axes, keepdims=false, exclude=exclude})\end{split}\end{aligned}\end{align} \]
sum¶
Set \(\text{OP_NAME}\) as
sum
Set \(\text{REDUCE_FUNC}\) as
\[\text{REDUCE_FUNC}(Z) = \sum Z,\]
Example
data = [[[1, 2], [2, 3], [1, 3]],
[[1, 4], [4, 3], [5, 2]],
[[7, 1], [7, 2], [7, 3]]]
sum(data, axis=(1))
[[ 4. 8.]
[ 10. 9.]
[ 21. 6.]]
sum(data, axis=[1,2])
[ 12. 19. 27.]
max¶
Set \(\text{OP_NAME}\) as
max
Set \(\text{REDUCE_FUNC}\) as
\[\text{REDUCE_FUNC}(Z) = \max Z,\]
Broadcast Operators¶
A broadcast operator performs the broadcast function to input data, and the process logic over all kinds of operators are the same.
Math Formalization
Input: There are 2 inputs.
\(A\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M-1})\)
\(B\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
The two input shapes of tensor must satisfy the assertions as below:
Output: \(Y\), a tensor with \(Q\) dimensions, the higher dimension of the two inputs, and it’s shape is identical to the input with higher dimension.
We abstract the formalization here and introduce the details as below:
broadcast_add¶
set \(\text{OP}\) to \(\text{add}\).
Example
x = [[ 1., 1., 1.],
[ 1., 1., 1.]]
y = [[ 0.],
[ 1.]]
broadcast_add(x, y)
[[ 1., 1., 1.],
[2., 2., 2.]]
broadcast_sub¶
set \(\text{OP}\) to \(\text{sub}\).
broadcast_mul¶
set \(\text{OP}\) to \(\text{mutiply}\).
broadcast_div¶
set \(\text{OP}\) to \(\text{divide}\).
broadcast_max¶
set \(\text{OP}\) to \(\text{max}\).
NN Operators¶
We provide NN operators for users. In fact, NN operators stand for neural network operators, the core of neural network learning mechanism. NN operators have parameters to be trained and logic for linear or non-linear transformation in a model graph.
conv2d¶
We only supported 2-D convolution operator. Also alias Group-wise Convolution.
Math Formalization
Input: there are 2 or 3 inputs.
\(X\), input data to be calculated whose shape is \((N, C, H, W)\)
\(W\), convolution kernel weight whose shape is \((OC, IC, KH, KW)\)
\(B\), bias, of type
Optional<DLTensor>
. If \(B\) is not None, it’s shape is \((\text{OC},)\).
Output: \(Y\)
Attributes:
padding
, aTShape
of length 2, namely \((PH, PW), PH,PW \in [min\_attr, max\_attr)\), indicating padding size.stride
, aTShape
of length 2, namely \((SH, SW) \in [1, max\_attr)\), indicating strides.dilation
, aTShape
of length 2, namely \((DH, DW) \in [1, max\_attr)\), parameter used in dilation convolution.groups
, anint
in \(\text{range} [1, C]\), indicating group number. \(C = IC \cdot \text{groups} \wedge OC \text{ mod } \text{groups} = 0\)
where \(\text{kernel}\) function does the 2D image convolution calculation, and the formulation is:
dense¶
Dense operator provides a full connected layer.
Math Formalization
Input: there 2 or 3 inputs.
\(X\), a matrix of shape \((M, K)\)
\(W\), a matrix of shape \((N, K)\)
\(B\), bias, of type
Optional<DLTensor>
, If \(B\) is notNONE
, it’s shape is \((N,)\).
Output: \(Y\), a matrix of shape \((M, N)\)
relu¶
Relu performs elementwise rectified linear unit function.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{n-1})\)
Output: \(Y\), the same shape as \(X\)
max_pool2d¶
Max_pool2d performs max pooling over every plane for each batch and channel.
Math Formalization
Input: \(X\), of shape \((N, C, H, W)\), indicating there are \(N\) batches, \(C\) channels and all the planes are of size \((H, W)\)
Output: \(Y\)
Attributes:
pool_size
, aTShape
of length 2, namely \((PSH, PSW)\)padding
, either aTShape
of length 2, namely \((PH, PW) \in [min\_attr, max\_attr)\), or an int indicating \(PH=PW\)strides
, aTShape
of length 2, namely \((SH, SW)\)ceil_mode
,boolean
.
upsampling¶
Upsampling operator performs upsampling to the input data by copying the value in each position serveral times.
Math Formalization
Input \(X\), whose shape is \((N, C, H, W)\)
Output \(Y\)
Attributes:
scale
, in range \([1, max\_attr)\).
Elemwise Operators¶
An elemwise operator performs the elementwise function to input data and the process logic over all kinds of operators are alike. There might be 1 or 2 input tensors and the logic might be complicated for someone. That’s way we don’t abstract them but describe each one.
abs¶
This operator calculates absolute value of input data.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\)
cvm_precision¶
The precision operator gives how many bits the absolute value of a number takes. 1 takes 1 bit. 2, 3 take 2 bits, etc. A special case is that 0 always takes at least 1 bit.
Math Formalization
Input \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output \(Y\), a tensor whose shape is same as \(X\)
elemwise_add¶
This operator performs elementwise add to the 2 input tensors.
Math Formalization
Input: there are 2 inputs, of the same shape.
\(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
\(B\), whose shape is same as \(A\).
Output: \(Y\), a tensor whose shape is same as \(A\) and \(B\).
elemwise_sub¶
This operator performs elementwise subtraction to the 2 input tensors.
Math Formalization
Input: there are 2 inputs, of the same shape.
\(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
\(B\), whose shape is same as \(A\).
Output: \(Y\), a tensor whose shape is same as \(A\) and \(B\).
negative¶
This operator performs elementwise negative to the input tensor.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
clip¶
This operator performs clip, cutting the data into a range, to the input tensor.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attributes:
a_min
a_max
cvm_cilp¶
This operator clips the input data into a certain CVM precision.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attribute:
precision
, an int in range \([1, 33)\)
cvm_right_shift¶
This operator performs right shift. Slightly different from C right shift, the result of this operator would be rounded to nearest integer. A special case is that negative half number will be rounded up, -1.5 rounded to -1 for example.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attribute:
precision
, an int in range \([1, 33)\)shift_bit
, an int in range \([1, 33)\)
cvm_left_shift¶
This operator performs left shift to the input tensor, same as C left shift operator.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attribute:
precision
, an int in range \([1, 33)\)shift_bit
, an int in range \([1, 33)\)
Transform Operators¶
transform operator do not do the calculation on the data but simply reshape, copy or select part of it. The process logic over all kinds of operators are quite different.
repeat¶
This operator repeats the input data by repeats
times along the given axis
. Each element is repeated right after itself.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{\text{axis}}, \cdots, n_{N-1})\)
Output: \(Y\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{\text{axis}} \cdot repeats, \cdots, n_{N-1})\)
Attribute:
axis
, an int in range \([0, N)\), indicating on which axis is the repeatation performed.repeats
, an int in range \([1, +\infty)\), indicating how many times the data is repeated.
tile¶
This operator tiles the input data serveral times on serveral dimensions. Different from Repeat operator repeating each element right after itself, this operator tiles the whole data after the whole data.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),
Output: \(Y\)
Attribute:
reps
, a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M-1})\).
flatten¶
This operator flattens the input tensor data to an array in a row-major order.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\).
where \(\text{flatten_index}\) is
concatenate¶
This operator concatenates tensors along a given axis.
Math Formalization
Inputs: \(M\) tensors, namely \(I^0, I^1, \cdots, I^{M-1}\). They all have \(N\) dimensions, namely \(I^i\)’s shape is \((n^i_0, n^i_1, \cdots, n^i_{N-1})\). All dimensions except the axis to be concatenated along must have the same length.
Output: \(Y\)
Attribute:
axis
, an int in range \([0, N)\).
transpose¶
This operator transposes the input data with a certain sequence.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\),
Attribute:
axes
, a TShape of length \(M \in \{0, N\}\), which meansaxes
is either empty or a permutation of \(\{0, 1, \cdots, N-1\}\)
slice¶
This operator slices an input array with given attribute. For each dimension, strides (default to be 1), start (default to be 0) and end (default to be length of this dimension) coordinates can be assigned by user.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\),
Attributes:
begin
, \(B\) dimensions.end
, \(E\) dimensions.strides
: \(S\) dimensions.
\(B\), \(E\), \(S\) can be different.
slice_like¶
This operator slices the input \(X\) to a shape that looks like the other given input shape_like
.
TODO: need more consideration.
Math Formalization
Input: there are 2 inputs
\(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
\(\text{shape_like}\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1 })\),
Output: \(Y\)
Attribute:
axes
, aTShape
with length \(K\). Ifaxes
is not empty, only those axes mentioned will be sliced on and others inshape_like
will also be ignored. \(M \ne N\) is allowed only if non-emptyaxes
are given. If \(M>N\), those dimensions higher than \(N\) will be ignored and if \(M<N\), only the first \(M\) dimensions are sliced while those dimensions higher than \(M\) will stay the same.
take¶
This operator takes some elements from the input data. If axis is not given, it flattens the input data and takes elements; if axis is given, takes the input data on this axis for every coordinates in other axes.
Math Formalization
Input: there 2 inputs
\(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
\(indices\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1})\)
Output: \(Y\),
Attribute:
axis
anOptional<int>
.
Case axis is
None
:
Case axis is
int
:
cvm_lut¶
This operator is a alias for a take where axis is None.
Math Formalization
Input: there are 2 inputs.
\(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
\(indices\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1})\)
Output: \(Y\).
expand_dims¶
This operator expands some new dimensions right from the given axis.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),
Output: \(Y\)
Attributes:
axis
, an int in range \([-N-1, N+1)\), indicating where the new dimensions starts. Note that \(N+1\), instead of \(N\), will be added to negative axis.num_newaxis
, an int in range \([min\_attr, max\_attr)\)
reshape¶
This operator reshapes the input data.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),
Output: \(Y\),
Attribute:
target_shape
, aTShape
of lengthM
, namely \((m_0, m_1, \cdots, m_{M-1})\) , s.t. \(m_0 * m_1 * \cdots * m_{M-1} = n_0 * n_1 * \cdots * n_{N-1}\).
squeeze¶
This operator removes dimensions of length 1.
Math Formalization
Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\).
Attribute:
axes
, aTShape
of length M.
where¶
This operator selects data from 2 inputs with condition given.
Math Formalization
Input: there are 3 inputs
\(Cond\), either a tensor whose shape is same as others, or a 1d tensor whose length is same as the length of others’ first dimension.
\(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
\(B\), a tensor whose shape is same as \(A\)
Output: \(Y\)
Case shape of \(Cond\) is same as others:
Case \(Cond\) is a 1d tensor:
Vision Operators¶
We provide 2 operators for vision scenario. Since the scenario is narrow, regulation of the input data is stricter than other operators. If there’s no other specification, the input data should be 3D, namely \((B, N, K)\), indicating number of batches, number of result for each batch and the length (\(K\)) of a result. The first value should be ID and the second should be the score.
get_valid_count¶
This operator counts how many results are more than a threshold and what are they.
Math Formalization
Input: \(X\), a 3D tensor of shape \((B, N, K), 2 \leqslant K \leqslant 32\)
Output:
\(\text{valid_count}\),
\(Y\),
Attribute:
score_threshold
, anint
.
non_max_suppression¶
This operator implements the nms algorithm, finding valid bounding boxes.
Math Formalization
Input: there are 2 inputs.
\(X\), a 3D tensor of shape \((B, N, K), K = 6\)
valid_count
, a 1D tensor of length \(B\)
Output: \(Y\)
Attributes:
iou_threshold
, anint
, representing the percentage of intersection over union with value in range \((0, +\infty)\) where 101 stands for bounding box full-overlap specifically and larger integer is equivalent to that.max_output_size
, anint
force_suppress
, aboolean
top_k
, anint
.
Note
\(\text{OLR}\), which means overlap ratio, calculates the overlap ratio of two rectangles: area of their intersection over the area of their union.