Operator Math Formalization¶

Note

Write this section document refer to the doc: Math Format please.

This will give a full exhaustive explanation to CVM-Runtime operators. The source code of FORMAL version has a strong correlation with this mathematical description, while other versions like CPU, CUDA, will only promise the consistent inference result, with arbitrary process logic.

Note

All numbers refered to by the symbol are integers by default.

All the operators’ formalization obeys the unified format:

\[ \begin{align}\begin{aligned}\begin{split}Y[y_\text{indices}] = X[x_\text{indices}], \\\end{split}\\\begin{split}\forall \text{given range}, \\\end{split}\\\text{where } \text{condition}_1 \text{ and} \text{condition}_2 \text{ and} \cdots \text{condition}_n\end{aligned}\end{align} \]

which means that for given value range, the forluma in the first line is always true, subjecting to the constraints listed as the condition statements.

The quick operators reference is listed as below:

Reduce Operators ¶

A reduce operator performs the reduction function to input data based on the parameters, and the process logic over all kind of operators are the same. Reduction is performed on the given axes, other dimensions remains the same and the result are stored in those places. We abstract the common reduce logic as formalization here and specify the reduce function for each operators respectively.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\)
Attribute:
- axes (TShape), which is \(M\)-length vector, where \(M \in [0, N+1)\)
- keepdims (bool)
- exclude (bool)

\[ \begin{align}\begin{aligned}\begin{split}T = \left\{x \mid i \in \text{axes} \wedge x = \begin{cases} i, & \text{if } i\geqslant 0 \\ i + N, & \text{otherwise} \end{cases} \right\}, \\\end{split}\\\text{where } card\{T\} = M \text{ and } j \in [0, N), \forall j \in \text{T}\end{aligned}\end{align} \]

\[U = \{ 0, 1, \cdots, N-1 \}\]

\[\begin{split}R = \begin{cases} U - T, & \text{if exclude is true} \\ T, & \text{otherwise} \end{cases}\end{split}\]

\[r = card\{R\}\]

Case exclude is true and \(M = N\)

\[Y = X\]

Case exclude is false and \(M = 0\)

\[ \begin{align}\begin{aligned}Y[\underbrace{0, 0, \cdots, 0}_{K}] = \text{REDUCE_FUNC}(X),\\\begin{split}\text{where } K = \begin{cases} 1, & \text{if keepdims is false} \\ N, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]

Case keepdims is false

\[ \begin{align}\begin{aligned}Y[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-r-1)}] = \text{REDUCE_FUNC}(Z),\\\forall d_{I(i)} \in [0, n_{I(i)}) \wedge i \in [0, N-r),\\\begin{split}\text{where } I: [0, N-r) \to U-R, \text{s.t. } \forall i < j, I(i) < I(j) \text{ and} \\ J: [0, r) \to R, \text{s.t. } \forall i < j, J(i) < J(j) \text{ and} \\ Z = \{ X[d_0, d_1, \cdots, d_{N-1}] \mid d_i \in [0, n_i) \wedge i \in J \}\end{split}\end{aligned}\end{align} \]

Otherwise

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = M[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-r-1)}], \\\end{split}\\\begin{split}\forall d_i \in [0, n_i) \wedge i \in U - R \wedge d_j = 0 \wedge j \in R, \\\end{split}\\\begin{split}\text{where } I: [0, N-r) \to U-R, \text{s.t. } \forall i < j, I(i) < I(j) \text{ and} \\ J: [0, r) \to R, \text{s.t. } \forall i < j, J(i) < J(j) \text{ and} \\ M = \text{OP_NAME}(X, \text{axes=axes, keepdims=false, exclude=exclude})\end{split}\end{aligned}\end{align} \]

sum ¶

Set \(\text{OP_NAME}\) as sum
Set \(\text{REDUCE_FUNC}\) as

\[\text{REDUCE_FUNC}(Z) = \sum Z,\]

Example

data = [[[1, 2], [2, 3], [1, 3]],
        [[1, 4], [4, 3], [5, 2]],
        [[7, 1], [7, 2], [7, 3]]]

sum(data, axis=(1))
[[  4.   8.]
 [ 10.   9.]
 [ 21.   6.]]

sum(data, axis=[1,2])
[ 12.  19.  27.]

max ¶

Set \(\text{OP_NAME}\) as max
Set \(\text{REDUCE_FUNC}\) as

\[\text{REDUCE_FUNC}(Z) = \max Z,\]

Broadcast Operators ¶

A broadcast operator performs the broadcast function to input data, and the process logic over all kinds of operators are the same.

Math Formalization

Input: There are 2 inputs.

\(A\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M-1})\)

\(B\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

The two input shapes of tensor must satisfy the assertions as below:

\[\begin{split}P = \min(M, N) \\ Q = \max(M, N)\end{split}\]

\[m_i = n_i \text{ or } m_i = 1 \text{ or } n_i = 1, \forall i \in [0, P)\]

Output: \(Y\), a tensor with \(Q\) dimensions, the higher dimension of the two inputs, and it’s shape is identical to the input with higher dimension.

We abstract the formalization here and introduce the details as below:

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{K-1}] = A[a_0, a_1, \cdots, a_{M-1}] \text{ OP } B[b_0, a_1, \cdots, a_{N-1}], \\\end{split}\\\begin{split}\forall i \in [0, Q) \wedge d_i \in [0, \max(em_i, en_i)), \\\end{split}\\\begin{split}\text{where } a_j = d_{Q-M+j} \text{ if } d_{Q-M+j} < m_j \text{ else } 0, \forall j \in [0, M) \text{ and} \\ b_j = d_{Q-N+j} \text{ if } d_{Q-N+j} < n_j \text{ else } 0, \forall j \in [0, N) \text{ and} \\ em_i = \begin{cases} 1, & i < Q - M \\ m_{Q-M+i}, & \text{otherwise} \end{cases}, \forall i \in [0, Q) \text{ and} \\ en_i = \begin{cases} 1, & i < Q - N \\ n_{Q-N+i}, & \text{otherwise} \end{cases}, \forall i \in [0, Q)\end{split}\end{aligned}\end{align} \]

broadcast_add ¶

set \(\text{OP}\) to \(\text{add}\).

Example

x = [[ 1.,  1.,  1.],
     [ 1.,  1.,  1.]]

y = [[ 0.],
     [ 1.]]

broadcast_add(x, y)
[[ 1.,  1.,  1.],
 [2.,  2.,  2.]]

broadcast_sub ¶

set \(\text{OP}\) to \(\text{sub}\).

broadcast_mul ¶

set \(\text{OP}\) to \(\text{mutiply}\).

broadcast_div ¶

set \(\text{OP}\) to \(\text{divide}\).

broadcast_max ¶

set \(\text{OP}\) to \(\text{max}\).

NN Operators ¶

We provide NN operators for users. In fact, NN operators stand for neural network operators, the core of neural network learning mechanism. NN operators have parameters to be trained and logic for linear or non-linear transformation in a model graph.

conv2d ¶

We only supported 2-D convolution operator. Also alias Group-wise Convolution.

Math Formalization

Input: there are 2 or 3 inputs.
- \(X\), input data to be calculated whose shape is \((N, C, H, W)\)
- \(W\), convolution kernel weight whose shape is \((OC, IC, KH, KW)\)
- \(B\), bias, of type Optional<DLTensor>. If \(B\) is not None, it’s shape is \((\text{OC},)\).
Output: \(Y\)
Attributes:
- padding, a TShape of length 2, namely \((PH, PW), PH,PW \in [min\_attr, max\_attr)\), indicating padding size.
- stride, a TShape of length 2, namely \((SH, SW) \in [1, max\_attr)\), indicating strides.
- dilation, a TShape of length 2, namely \((DH, DW) \in [1, max\_attr)\), parameter used in dilation convolution.
- groups, an int in \(\text{range} [1, C]\), indicating group number. \(C = IC \cdot \text{groups} \wedge OC \text{ mod } \text{groups} = 0\)

\[ \begin{align}\begin{aligned}\begin{split}Y[n,oc,p,q]= \sum_{ic = 0}^{IC-1} \text{kernel}(n,(oc \div OPG) * IC + ic, p, q, oc,ic) + \begin{cases} 0, & \text{if B is None}\\ B[oc], & \text{otherwise} \end{cases}, \\\end{split}\\\forall n \in [0, N) \wedge oc\in [0, OC) \wedge p \in \left[0, \text{Y_HMAX} \right) \wedge q \in \left[0, \text{Y_WMAX} \right),\\\begin{split}\text{where } \text{Y_HMAX} = \left\lfloor{ H+2 \cdot \text{PH}-\text{DH} \cdot (\text{KH}-1)-1 \over \text{SH} }\right\rfloor + 1 \text{ and} \\ \text{Y_WMAX} = \left\lfloor{ W+2 \cdot \text{PW}-\text{DW} \cdot (\text{KW}-1)-1 \over \text{SW} }\right\rfloor + 1 \text{ and} \\ OPG = OC / \text{groups, } OPG \in \mathbb N^+ \text{ since } OC \text{ mod } \text{groups} = 0\\\end{split}\end{aligned}\end{align} \]

where \(\text{kernel}\) function does the 2D image convolution calculation, and the formulation is:

\[ \begin{align}\begin{aligned}\begin{split}\text{kernel}(n, j, p, q, o, i) = \sum_{k_i=0}^{\text{KH}} \sum_{k_j = 0}^{\text{KW}} \text{pad}(p'+k_i*\text{DH},q'+k_j*\text{DW}) \cdot W[o, i, k_i, k_j], \\\end{split}\\\begin{split}\text{where } p' = p \cdot \text{SH} -\text{PH} \text{ and } q' = q \cdot \text{SW}-\text{PW} \text{ and} \\ \text{pad}(p, q) = \begin{cases} X[n, j, p, q], & \text{ if } p \in [0, H) \wedge q \in [0, W) \\ 0, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]

dense ¶

Dense operator provides a full connected layer.

Math Formalization

Input: there 2 or 3 inputs.
- \(X\), a matrix of shape \((M, K)\)
- \(W\), a matrix of shape \((N, K)\)
- \(B\), bias, of type Optional<DLTensor>, If \(B\) is not NONE, it’s shape is \((N,)\).
Output: \(Y\), a matrix of shape \((M, N)\)

\[\begin{split}Y=X W^T + \begin{cases} 0, & \text{if B is None} \\ B, & \text{otherwise} \end{cases}\end{split}\]

relu ¶

Relu performs elementwise rectified linear unit function.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{n-1})\)
Output: \(Y\), the same shape as \(X\)

\[\begin{split}Y[d_0, d_1, \cdots, d_{n-1}] = max(0, X[d_0, d_1, \cdots, d_{n-1}]), \\ \forall i \in [0, N) \wedge d_i \in [0, n_i)\end{split}\]

max_pool2d ¶

Max_pool2d performs max pooling over every plane for each batch and channel.

Math Formalization

Input: \(X\), of shape \((N, C, H, W)\), indicating there are \(N\) batches, \(C\) channels and all the planes are of size \((H, W)\)
Output: \(Y\)
Attributes:
- pool_size, a TShape of length 2, namely \((PSH, PSW)\)
- padding, either a TShape of length 2, namely \((PH, PW) \in [min\_attr, max\_attr)\), or an int indicating \(PH=PW\)
- strides, a TShape of length 2, namely \((SH, SW)\)
- ceil_mode, boolean.

\[\begin{split}PSH \in [0, H + 2PH + 1), \\ PSW \in [0, W + 2PW + 1), \\ PSH > PH \wedge PSW > PW\end{split}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[n,i,p,q] = \max\{\text{pad}(n, i, p', q') \\ \mid p' \in [p \cdot \text{SH} -\text{PH}, p \cdot \text{SH} -\text{PH}+\text{PSH}), q' \in [q \cdot \text{SW}-\text{PW}, q \cdot \text{SW}-\text{PW}+\text{PSW})\}, \\\end{split}\\\begin{split}\forall n \in [0, N) \wedge i \in [0, C) \wedge \\ p \in \left[0, \text{ceil_func}\left({H+2 \cdot \text{PH}- \text{PSH}\over\text{SH}}\right)+1 \right) \wedge \\ q \in \left[0, \text{ceil_func}\left({W+2 \cdot \text{PW}- \text{PSW} \over \text{SW}}\right)+1 \right), \\\end{split}\\\begin{split}\text{where } \text{ceil_func(val)} = \begin{cases} \lceil \text{val} \rceil, & \text{if ceil_mode is true} \\ \lfloor \text{val} \rfloor, & \text{otherwise} \end{cases} \text{ and} \\ \text{pad}(n, i, p, q) = \begin{cases} X[n, i, p, q], & \text{ if } p \in [0, H) \wedge q \in [0, W) \\ INT32\_MIN, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]

upsampling ¶

Upsampling operator performs upsampling to the input data by copying the value in each position serveral times.

Math Formalization

Input \(X\), whose shape is \((N, C, H, W)\)
Output \(Y\)
Attributes: scale, in range \([1, max\_attr)\).

\[\begin{split}Y[n, i, h, w] = X[n, i, \left\lfloor {h \over \text{scale}}\right\rfloor, \left\lfloor {w \over \text{scale}}\right\rfloor], \\ \forall n \in [0, N) \wedge i \in [0, C) \wedge h \in [0, H \cdot \text{scale}) \wedge w \in [0, W \cdot \text{scale})\end{split}\]

Elemwise Operators ¶

An elemwise operator performs the elementwise function to input data and the process logic over all kinds of operators are alike. There might be 1 or 2 input tensors and the logic might be complicated for someone. That’s way we don’t abstract them but describe each one.

abs ¶

This operator calculates absolute value of input data.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\)

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} x, & x \geqslant 0 \\ -x, & x < 0 \end{cases},\\\end{split}\\\begin{split}\forall i \in [0, N) \wedge d_i \in [0, n_i),\\\end{split}\\\text{where } x \text{ denotes } X[d_0, d_1, \cdots, d_{N-1}]\end{aligned}\end{align} \]

cvm_precision ¶

The precision operator gives how many bits the absolute value of a number takes. 1 takes 1 bit. 2, 3 take 2 bits, etc. A special case is that 0 always takes at least 1 bit.

Math Formalization

Input \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output \(Y\), a tensor whose shape is same as \(X\)

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} \lceil log_2(abs(x)+1) \rceil, & x \neq 0\\ 1, & x = 0 \end{cases},\\\end{split}\\\begin{split}\forall i \in [0, N) \wedge d_i \in [0, n_i),\\\end{split}\\\text{where } x \text{ denotes } X[d_0, d_1, \cdots, d_{N-1}]\end{aligned}\end{align} \]

elemwise_add ¶

This operator performs elementwise add to the 2 input tensors.

Math Formalization

Input: there are 2 inputs, of the same shape.
- \(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
- \(B\), whose shape is same as \(A\).
Output: \(Y\), a tensor whose shape is same as \(A\) and \(B\).

\[Y = A + B\]

elemwise_sub ¶

This operator performs elementwise subtraction to the 2 input tensors.

Math Formalization

Input: there are 2 inputs, of the same shape.
- \(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
- \(B\), whose shape is same as \(A\).
Output: \(Y\), a tensor whose shape is same as \(A\) and \(B\).

\[Y = A - B\]

negative ¶

This operator performs elementwise negative to the input tensor.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).

\[Y = -X\]

clip ¶

This operator performs clip, cutting the data into a range, to the input tensor.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attributes:
- a_min
- a_max

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} \text{a_max}, & x \geqslant \text{a_max} \\ x, & x \in (\text{a_min}, \text{a_max}) \\ \text{a_min}, & x \leqslant \text{a_min} \end{cases},\\\end{split}\\\begin{split}\forall i \in [0, N) \wedge d_i \in [0, n_i), \\\end{split}\\\text{where } x \text{ denotes } X[d_0, d_1, \cdots, d_{N-1}]\end{aligned}\end{align} \]

cvm_cilp ¶

This operator clips the input data into a certain CVM precision.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attribute: precision, an int in range \([1, 33)\)

\[\begin{split}Y = clip(X, \text{a_min}=-\alpha, \text{a_max}=\alpha), \\ \text{where } \alpha = 2^\text{precision-1}-1\end{split}\]

cvm_right_shift ¶

This operator performs right shift. Slightly different from C right shift, the result of this operator would be rounded to nearest integer. A special case is that negative half number will be rounded up, -1.5 rounded to -1 for example.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attribute:
- precision, an int in range \([1, 33)\)
- shift_bit, an int in range \([1, 33)\)

\[\begin{split}Y = clip(T, \text{a_min} = -\alpha, \text{a_max}=\alpha), \\ \text{where } T = {\left\lfloor \left(\left\lfloor \frac{X}{2^{\text{shift_bit} - 1}} \right\rfloor + 1 \right) \div 2 \right\rfloor} \text{ and } \alpha = 2 ^ {\text{precision} - 1} - 1\end{split}\]

cvm_left_shift ¶

This operator performs left shift to the input tensor, same as C left shift operator.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\), a tensor whose shape is same as \(X\).
Attribute:
- precision, an int in range \([1, 33)\)
- shift_bit, an int in range \([1, 33)\)

\[\begin{split}Y = clip(T, \text{a_min} = -\alpha, \text{a_max}=\alpha), \\ \text{where } T = X * 2^\text{shift_bit} \text{ and } \alpha = 2 ^ {\text{precision} - 1} - 1\end{split}\]

Transform Operators ¶

transform operator do not do the calculation on the data but simply reshape, copy or select part of it. The process logic over all kinds of operators are quite different.

repeat ¶

This operator repeats the input data by repeats times along the given axis. Each element is repeated right after itself.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{\text{axis}}, \cdots, n_{N-1})\)
Output: \(Y\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{\text{axis}} \cdot repeats, \cdots, n_{N-1})\)
Attribute:
- axis, an int in range \([0, N)\), indicating on which axis is the repeatation performed.
- repeats, an int in range \([1, +\infty)\), indicating how many times the data is repeated.

\[\begin{split}Y[d_0, d_1, \cdots, d_\text{axis}, \cdots, d_{N-1}] = X[d_0, d_1, \cdots, \left\lfloor{d_\text{axis} \over \text{repeats}}\right\rfloor, \cdots, d_{N-1}], \\ \forall d_0 \in [0, n_0) \wedge \cdots \wedge d_{axis-1} \in [0, n_{axis-1}) \wedge d_{axis} \in [0, n_{axis} \cdot \text{repeats}) \wedge \\ d_{axis+1} \in [0, n_{axis+1}) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1})\end{split}\]

tile ¶

This operator tiles the input data serveral times on serveral dimensions. Different from Repeat operator repeating each element right after itself, this operator tiles the whole data after the whole data.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),
Output: \(Y\)
Attribute: reps, a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M-1})\).

\[r \in [1, max\_attr), \forall r \in \text{reps}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[k_0, \cdots, k_{K-N-1}, k_{K-N}, k_{K-N+1}, \cdots, k_{K-1}] = \\ X[k_{K-N+0} \text{ mod } n_0, k_{K-N+1} \text{ mod } n_1, \cdots, k_{K-N+N-1} \text{ mod } n_{N-1}], \\ \forall k_0 \in [0, S_0) \wedge \cdots \wedge k_{K-1} \in [0, S_{K-1}), \\\end{split}\\\begin{split}\text{where } K = \max\{M, N\} \text{ and } S_i = SX_i \cdot SR_i \text{ and} \\ SX_p = \begin{cases} n_{p-K+N}, & p \in [K-N, K-1) \\ 1, & p \in [0, K-N) \end{cases} \text{ and} \\ SR_q = \begin{cases} m_{q-K+M}, & q \in [K-M, K-1) \\ 1, & q \in [0, K-M) \end{cases}\end{split}\end{aligned}\end{align} \]

flatten ¶

This operator flattens the input tensor data to an array in a row-major order.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).
Output: \(Y\).

\[\begin{split}Y[\text{flatten_index}(d_0, d_1, \cdots, d_{N-1}, n_0, n_1, \cdots, n_{N-1})] = \\ X[d_0, d_1, \cdots, d_{N-1}], \\ \forall d_0 \in [0, n_0) \wedge d_1 \in [0, n_1) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1})\end{split}\]

where \(\text{flatten_index}\) is

\[\begin{split}\text{flatten_index}(d_0, d_1, \cdots, d_{N-1}, n_0, n_1, \cdots, n_{N-1}) = \\ d_0 \cdot \prod_{i = 1}^{N-1} n_i + d_1 \cdot \prod_{i = 2}^{N-1} n_i + \cdots + d_{N-2} * n_{N-1} + d_{N-1}\end{split}\]

concatenate ¶

This operator concatenates tensors along a given axis.

Math Formalization

Inputs: \(M\) tensors, namely \(I^0, I^1, \cdots, I^{M-1}\). They all have \(N\) dimensions, namely \(I^i\)’s shape is \((n^i_0, n^i_1, \cdots, n^i_{N-1})\). All dimensions except the axis to be concatenated along must have the same length.
Output: \(Y\)
Attribute: axis, an int in range \([0, N)\).

\[n^i_j = n^0_j, \forall i \in [1, M) \wedge j \in [0, N) \wedge j \neq \text{axis}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_\text{axis-1}, \text{new_idx}, d_\text{axis+1}, \cdots, d_{N-1}] = I^i[d_0, d_1, \cdots, d_{N-1}], \\ \forall d_0 \in [0, n^i_0) \wedge \cdots \wedge d_{N-1} \in [0, n^i_{N-1}) \wedge i \in [0, M), \\\end{split}\\\text{where new_idx} = \sum_{j=0}^{i-1} n^j_\text{axis} + d_\text{axis}\end{aligned}\end{align} \]

transpose ¶

This operator transposes the input data with a certain sequence.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\),
Attribute: axes, a TShape of length \(M \in \{0, N\}\), which means axes is either empty or a permutation of \(\{0, 1, \cdots, N-1\}\)

\[\text{axis} \in [-N, N), \forall \text{axis} \in \text{axes}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[d_{\text{real_axes}_0}, d_{\text{real_axes}_1}, \cdots, d_{\text{real_axes}_{N-1}}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall d_0 \in [0, n_0) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1}), \\\end{split}\\\begin{split}\text{where real_axes}_i = \begin{cases} \text{axes}_i, & M = N \wedge \text{axes}_i \geqslant 0 \\ \text{axes}_i + N, & M = N \wedge \text{axes}_i < 0 \\ N-1-i, & M = 0 \end{cases} \text{ and} \\ card \; \{\text{real_axes}_i \mid i \in [0, N) \} = N\end{split}\end{aligned}\end{align} \]

slice ¶

This operator slices an input array with given attribute. For each dimension, strides (default to be 1), start (default to be 0) and end (default to be length of this dimension) coordinates can be assigned by user.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\),
Attributes:
- begin, \(B\) dimensions.
- end, \(E\) dimensions.
- strides: \(S\) dimensions.
\(B\), \(E\), \(S\) can be different.

\[\begin{split}\text{b_arr}[b] = \begin{cases} \text{begin}[b] + n_i, & b \in [0, N) \wedge b < B \wedge begin[b] < 0 \\ \text{begin}[b], & b \in [0, N) \wedge b < B \wedge begin[b] \geqslant 0 \\ 0, & b \in [0, N) \wedge b \geqslant B \end{cases}, b \in [0, N) \\\end{split}\]

\[\begin{split}\text{e_arr}[e] = \begin{cases} \text{end}[e] + n_i, & e \in [0, N) \wedge e < E \wedge end[e] < 0\\ \text{end}[e], & e \in [0, N) \wedge e < E \wedge end[e] \geqslant 0\\ n_{e}, & e \in [0, N) \wedge e \geqslant E \end{cases}, e \in [0, N) \\\end{split}\]

\[\begin{split}\text{s_arr}[s] = \begin{cases} \text{stride}[s], & s \in [0, N) \wedge s < S \\ 1, & s \in [0, N) \wedge s \geqslant S \end{cases}, s \in [0, N) \\\end{split}\]

\[\begin{split}\forall \{i \mid i \in [0, N)\}: \text{s_arr}[i] \ne 0 \\\end{split}\]

\[\begin{split}\text{b_range}(i) = \begin{cases} -1, & \text{s_arr}[i] < 0 \\ 0, & \text{s_arr}[i] \geqslant 0 \end{cases} \\\end{split}\]

\[\begin{split}\text{e_range}(i) = \begin{cases} n_i - 1, & \text{s_arr}[i] < 0 \\ n_i, & \text{s_arr}[i] \geqslant 0 \end{cases} \\\end{split}\]

\[\begin{split}\text{b_vec}[b] = clip(\text{b_arr}[b], \text{a_min}=\text{b_range}(b), \text{a_max}=\text{e_range}(b)-1), b \in [0, N) \\\end{split}\]

\[\begin{split}\text{e_vec}[e] = clip(\text{e_arr}[e], \text{a_min}=\text{b_range}(e), \text{a_max}=\text{e_range}(e)-1), e \in [0, N) \\\end{split}\]

\[\begin{split}\forall \{i \mid i \in [0, N) \}: \begin{cases} \text{b_vec}[i] < \text{e_vec}[i], & \text{s_arr}[i] > 0 \\ \text{e_vec}[i] < \text{b_vec}[i], & \text{s_arr}[i] < 0 \end{cases} \\\end{split}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = X[K[0], K[1], \cdots, K[N-1]], \\\end{split}\\\forall d_j \in [0, \left\lceil{ \text{e_vec}[j] - \text{b_vec}[j] \over \text{s_arr}[j] }\right\rceil ) \wedge j \in [0, N),\\\text{where } K[i] = \text{b_vec}[i] + \text{s_arr}[i] * d_i\end{aligned}\end{align} \]

slice_like ¶

This operator slices the input \(X\) to a shape that looks like the other given input shape_like.

TODO: need more consideration.

Math Formalization

Input: there are 2 inputs
- \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
- \(\text{shape_like}\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1 })\),
Output: \(Y\)
Attribute: axes, a TShape with length \(K\). If axes is not empty, only those axes mentioned will be sliced on and others in shape_like will also be ignored. \(M \ne N\) is allowed only if non-empty axes are given. If \(M>N\), those dimensions higher than \(N\) will be ignored and if \(M<N\), only the first \(M\) dimensions are sliced while those dimensions higher than \(M\) will stay the same.

\[ \begin{align}\begin{aligned}\begin{split}\text{sliced_axes} = \begin{cases} \{j \mid j \in axes \wedge j \geqslant 0\} \bigcup \{j + N \mid j \in axes \wedge j < 0\}, & K > 0\\ \{0, 1, \cdots, M-1\}, & K = 0 \end{cases}, \\\end{split}\\\begin{split}\text{where } \forall j \in \text{sliced_axes}: j < \min(M, N) \text{ and } m_j \leqslant n_j\\\end{split}\end{aligned}\end{align} \]

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall j \in [0, N) \wedge d_j \in \begin{cases} [0, m_j), & j \in \text{sliced_axes} \\ [0, n_j), & j \notin \text{sliced_axes} \end{cases},\\\end{split}\end{aligned}\end{align} \]

take ¶

This operator takes some elements from the input data. If axis is not given, it flattens the input data and takes elements; if axis is given, takes the input data on this axis for every coordinates in other axes.

Math Formalization

Input: there 2 inputs
- \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
- \(indices\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1})\)
Output: \(Y\),
Attribute: axis an Optional<int>.

Case axis is None :

\[ \begin{align}\begin{aligned}\begin{split}T = flatten(X) \\ Y[d_0, d_1, \cdots, d_{M-1}] = T[clip(\text{xidx}, \text{a_min}=0, \text{a_max}=|T|-1)],\\\end{split}\\\begin{split}\forall (d_0, d_1, \cdots, d_{M-1}), \\\end{split}\\\begin{split}\text{ where } d_j \in [0, m_j) \wedge j \in [0, M) \text{ and }\\ \text{xidx} = \text{indices}[d_0, d_1, \cdots, d_{M-1}]\end{split}\end{aligned}\end{align} \]

Case axis is int:

\[ \begin{align}\begin{aligned}\begin{split}\text{axis} \in [-N, N) \\ \text{real_axis} = \begin{cases} \text{axis}, & \text{axis} \geqslant 0 \\ \text{axis} + N, & \text{axis} < 0 \end{cases} \\ Y[d_0, d_1, \cdots, d_{M+N-2}] = X[d_0, \cdots, d_{\text{real_axis}-1}, \text{xdix}, d_{\text{real_axis}+M}, \cdots, d_{M+N-2}], \\\end{split}\\\begin{split}\forall d_j \in \begin{cases} [0, n_j), & j < \text{real_axis} \\ [0, m_{j-\text{real_axis}}), & j \in [\text{real_axis}, \text{real_axis}+M) \\ [0, n_{j-M+1}), & j \in [\text{real_axis} + M, M+N-1) \end{cases},\\ \text{where } \text{xidx}{} = clip(\text{indices}[d_{\text{real_axis}}, d_{\text{real_axis}+1}, \cdots, d_{\text{real_axis}+M-1}],\\ \text{a_min}=0, \text{a_max}=n_{\text{real_axis}}-1)\end{split}\end{aligned}\end{align} \]

cvm_lut ¶

This operator is a alias for a take where axis is None.

Math Formalization

Input: there are 2 inputs.
- \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
- \(indices\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1})\)
Output: \(Y\).

\[Y=take(X, \text{indices}, \text{axis}=\text{None})\]

expand_dims ¶

This operator expands some new dimensions right from the given axis.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),
Output: \(Y\)
Attributes:
- axis, an int in range \([-N-1, N+1)\), indicating where the new dimensions starts. Note that \(N+1\), instead of \(N\), will be added to negative axis.
- num_newaxis, an int in range \([min\_attr, max\_attr)\)

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0,d_1, \cdots, d_{\text{real_axis}-1}, \underbrace{0, 0, \cdots, 0}_{\text{num_newaxis}}, d_\text{real_axis}, \cdots, d_{N-1}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall d_0 \in [0, n_0) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1}), \\\end{split}\\\begin{split}\text{where real_axis} = \begin{cases} \text{axis},& \text{axis} \geqslant 0 \\ \text{axis} + N,& \text{axis} < 0 \end{cases}\end{split}\end{aligned}\end{align} \]

reshape ¶

This operator reshapes the input data.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),
Output: \(Y\),
Attribute: target_shape, a TShape of length M, namely \((m_0, m_1, \cdots, m_{M-1})\) , s.t. \(m_0 * m_1 * \cdots * m_{M-1} = n_0 * n_1 * \cdots * n_{N-1}\).

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{M-1}] = T[\text{flatten_index}(d_0, d_1, \cdots, d_{M-1}, m_0, m_1, \cdots, m_{N-1})], \\\end{split}\\\begin{split}\forall d_0 \in [0, m_0) \wedge \cdots \wedge d_{N-1} \in [0, m_{N-1}), \\\end{split}\\\text{where } T = \text{flatten}(X)\end{aligned}\end{align} \]

squeeze ¶

This operator removes dimensions of length 1.

Math Formalization

Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
Output: \(Y\).
Attribute: axes, a TShape of length M.

\[\text{axis} \in [-N, N), \forall \text{axis} \in \text{axes}\]

\[\begin{split}\text{real_axes} = \begin{cases} \{\text{axis} \mid \text{axis} \geqslant 0 \wedge \text{axis} \in \text{axes} \} \bigcup \{\text{axis} + N \mid \text{axis} < 0 \wedge \text{axis} \in \text{axis}\}, & M > 0 \\ \{\text{axis} \mid n_\text{axis} = 1 \wedge \text{axis} \in [0, N) \}, & M = 0 \end{cases} \\\end{split}\]

\[n_\text{axis} = 1, \forall \text{axis} \in \text{real_axis}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-K-1)}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall d_0 \in [0, n_0) \wedge d_1 \in [0, n_1) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1}), \\\end{split}\\\begin{split}\text{where } K = card \; \text{\{real_axes\}} \text{ and} \\\end{split}\\\begin{split}I: \{i \mid i \in [0, N-K) \} \to \{i \mid i \in [0, N) \wedge i \notin \text{real_axes} \}, \\ \text{s.t. } I(i) < I(j), \forall 0 \leqslant i < j < N-K\end{split}\end{aligned}\end{align} \]

where ¶

This operator selects data from 2 inputs with condition given.

Math Formalization

Input: there are 3 inputs
- \(Cond\), either a tensor whose shape is same as others, or a 1d tensor whose length is same as the length of others’ first dimension.
- \(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)
- \(B\), a tensor whose shape is same as \(A\)
Output: \(Y\)

Case shape of \(Cond\) is same as others:

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} A[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0, d_1, \cdots, d_{N-1}] \neq 0\\ B[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0, d_1, \cdots, d_{N-1}] = 0\end{cases},\\\end{split}\\\forall d_i \in [0, n_i) \wedge i \in [0, N)\end{aligned}\end{align} \]

Case \(Cond\) is a 1d tensor:

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} A[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0] \neq 0\\ B[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0] = 0 \end{cases},\\\end{split}\\\forall d_i \in [0, n_i) \wedge i \in [0, N)\end{aligned}\end{align} \]

Vision Operators ¶

We provide 2 operators for vision scenario. Since the scenario is narrow, regulation of the input data is stricter than other operators. If there’s no other specification, the input data should be 3D, namely \((B, N, K)\), indicating number of batches, number of result for each batch and the length (\(K\)) of a result. The first value should be ID and the second should be the score.

get_valid_count ¶

This operator counts how many results are more than a threshold and what are they.

Math Formalization

Input: \(X\), a 3D tensor of shape \((B, N, K), 2 \leqslant K \leqslant 32\)
Output:
- \(\text{valid_count}\),
- \(Y\),
Attribute: score_threshold, an int.

\[\begin{split}\text{valid_count}[b] = card\{ q \mid q \in [0, N) \wedge X[b, q, 1] > \text{score_threshold} \}, \\ \quad \forall b \in [0, B)\end{split}\]

\[ \begin{align}\begin{aligned}\begin{split}Y[b, \text{idx}, k] = X[b, n, k], \\ \quad \forall b \in [0, B) \wedge n \in [0, N) \wedge k \in [0, K) \wedge X[b, n, 1] > \text{score_threshold}, \\\end{split}\\\quad \text{where idx = } card \{q \mid q \in [0, n) \wedge X[b, q, 1] > \text{score_threshold} \}\end{aligned}\end{align} \]

\[Y[b,n, k] = -1, \forall b \in [0, B) \wedge n \in [\text{valid_count}[b], N) \wedge k \in [0, K)\]

non_max_suppression ¶

This operator implements the nms algorithm, finding valid bounding boxes.

Math Formalization

Input: there are 2 inputs.
- \(X\), a 3D tensor of shape \((B, N, K), K = 6\)
- valid_count, a 1D tensor of length \(B\)
Output: \(Y\)
Attributes:
- iou_threshold, an int, representing the percentage of intersection over union with value in range \((0, +\infty)\) where 101 stands for bounding box full-overlap specifically and larger integer is equivalent to that.
- max_output_size, an int
- force_suppress, a boolean
- top_k, an int.

\[ \begin{align}\begin{aligned}\begin{split}R[b, i, k] = X[b, I(i), k], \\\end{split}\\\begin{split}\forall b \in [0, B) \wedge i \in [0, T) \wedge k \in [0, K), \\\end{split}\\\begin{split}\text{where } T = \text{max}\{ \text{min}(N, \text{valid_count}[b]), 0\} \text{ and} \\ I: \{ i \mid i \in [0, T) \} \to \{ i \mid i \in [0, T) \}, \\ \text {s.t. } X[b, I(i), 1] > X[b, I(j), 1] \text{ or } \\ (X[b, I(i), 1] = X[b, I(j), 1] \wedge I(i) < I(j)), \forall 0 \leqslant i < j < T\end{split}\end{aligned}\end{align} \]

\[ \begin{align}\begin{aligned}\begin{split}Y[b, n, k] = R[b, \text{IDX}(n), k], \\\end{split}\\\begin{split}\forall b \in [0, B) \wedge n \in [0, \min\{T, \text{MOS}, card\{U\}\}) \wedge k \in [0, K), \\\end{split}\\\begin{split}\text{where } \text{TK} = \begin{cases} +\infty, & \text{if top_k < 0} \\[1ex] \text{top_k}, & \text{otherwise} \end{cases} \text{ and} \\ \text{MOS} = \begin{cases} +\infty, & \text{if max_output_size < 0} \\[1ex] \text{max_output_size}, & \text{otherwise} \end{cases} \text{ and} \\ \text{iou}(p, q) = \begin{cases} \text{OLR}(R[b, p, :], R[b, q, :]), & \begin{array}{} \text{force_suppress is true}\\ \text{ or } R[b, p, 0] = R[b, q, 0] \end{array} \\[1ex] 0, & \text{otherwise} \end{cases} \text{ and} \\ \text{U} = \{p \mid p \in [0, min\{TK, T\}) \wedge R[b,p,0] >= 0 \wedge \\ \text{iou}(p, q) < \text{iou_threshold}, \forall q \in U \wedge q < p\} \text{ and} \\ \text{IDX}: \{i \mid i \in [0, card\{U\})\} \to U, \text{s.t. } \text{IDX}(i) < \text{IDX}(j), \forall i < j\end{split}\end{aligned}\end{align} \]

\[\begin{split}Y[b, n, k] = -1, \\ \forall b \in [0, B) \wedge k \in [0, K) \wedge n \in [min\{T, \text{MOS},card\{U\}\}, N)\end{split}\]

Note

\(\text{OLR}\), which means overlap ratio, calculates the overlap ratio of two rectangles: area of their intersection over the area of their union.