Operator Math Formalization

Note

Write this section document refer to the doc: Math Format please.

This will give a full exhaustive explanation to CVM-Runtime operators. The source code of FORMAL version has a strong correlation with this mathematical description, while other versions like CPU, CUDA, will only promise the consistent inference result, with arbitrary process logic.

Note

All numbers refered to by the symbol are integers by default.

All the operators’ formalization obeys the unified format:

\[ \begin{align}\begin{aligned}\begin{split}Y[y_\text{indices}] = X[x_\text{indices}], \\\end{split}\\\begin{split}\forall \text{given range}, \\\end{split}\\\text{where } \text{condition}_1 \text{ and} \text{condition}_2 \text{ and} \cdots \text{condition}_n\end{aligned}\end{align} \]

which means that for given value range, the forluma in the first line is always true, subjecting to the constraints listed as the condition statements.

The quick operators reference is listed as below:

Reduce Operators

A reduce operator performs the reduction function to input data based on the parameters, and the process logic over all kind of operators are the same. Reduction is performed on the given axes, other dimensions remains the same and the result are stored in those places. We abstract the common reduce logic as formalization here and specify the reduce function for each operators respectively.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

  • Output: \(Y\)

  • Attribute:

    • axes (TShape), which is \(M\)-length vector, where \(M \in [0, N+1)\)

    • keepdims (bool)

    • exclude (bool)

\[ \begin{align}\begin{aligned}\begin{split}T = \left\{x \mid i \in \text{axes} \wedge x = \begin{cases} i, & \text{if } i\geqslant 0 \\ i + N, & \text{otherwise} \end{cases} \right\}, \\\end{split}\\\text{where } card\{T\} = M \text{ and } j \in [0, N), \forall j \in \text{T}\end{aligned}\end{align} \]
\[U = \{ 0, 1, \cdots, N-1 \}\]
\[\begin{split}R = \begin{cases} U - T, & \text{if exclude is true} \\ T, & \text{otherwise} \end{cases}\end{split}\]
\[r = card\{R\}\]
  1. Case exclude is true and \(M = N\)

\[Y = X\]
  1. Case exclude is false and \(M = 0\)

\[ \begin{align}\begin{aligned}Y[\underbrace{0, 0, \cdots, 0}_{K}] = \text{REDUCE_FUNC}(X),\\\begin{split}\text{where } K = \begin{cases} 1, & \text{if keepdims is false} \\ N, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]
  1. Case keepdims is false

\[ \begin{align}\begin{aligned}Y[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-r-1)}] = \text{REDUCE_FUNC}(Z),\\\forall d_{I(i)} \in [0, n_{I(i)}) \wedge i \in [0, N-r),\\\begin{split}\text{where } I: [0, N-r) \to U-R, \text{s.t. } \forall i < j, I(i) < I(j) \text{ and} \\ J: [0, r) \to R, \text{s.t. } \forall i < j, J(i) < J(j) \text{ and} \\ Z = \{ X[d_0, d_1, \cdots, d_{N-1}] \mid d_i \in [0, n_i) \wedge i \in J \}\end{split}\end{aligned}\end{align} \]
  1. Otherwise

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = M[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-r-1)}], \\\end{split}\\\begin{split}\forall d_i \in [0, n_i) \wedge i \in U - R \wedge d_j = 0 \wedge j \in R, \\\end{split}\\\begin{split}\text{where } I: [0, N-r) \to U-R, \text{s.t. } \forall i < j, I(i) < I(j) \text{ and} \\ J: [0, r) \to R, \text{s.t. } \forall i < j, J(i) < J(j) \text{ and} \\ M = \text{OP_NAME}(X, \text{axes=axes, keepdims=false, exclude=exclude})\end{split}\end{aligned}\end{align} \]

sum

  • Set \(\text{OP_NAME}\) as sum

  • Set \(\text{REDUCE_FUNC}\) as

    \[\text{REDUCE_FUNC}(Z) = \sum Z,\]

Example

data = [[[1, 2], [2, 3], [1, 3]],
        [[1, 4], [4, 3], [5, 2]],
        [[7, 1], [7, 2], [7, 3]]]

sum(data, axis=(1))
[[  4.   8.]
 [ 10.   9.]
 [ 21.   6.]]

sum(data, axis=[1,2])
[ 12.  19.  27.]

max

  • Set \(\text{OP_NAME}\) as max

  • Set \(\text{REDUCE_FUNC}\) as

    \[\text{REDUCE_FUNC}(Z) = \max Z,\]

Broadcast Operators

A broadcast operator performs the broadcast function to input data, and the process logic over all kinds of operators are the same.

Math Formalization

  • Input: There are 2 inputs.

  • \(A\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M-1})\)

  • \(B\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

The two input shapes of tensor must satisfy the assertions as below:

\[\begin{split}P = \min(M, N) \\ Q = \max(M, N)\end{split}\]
\[m_i = n_i \text{ or } m_i = 1 \text{ or } n_i = 1, \forall i \in [0, P)\]
  • Output: \(Y\), a tensor with \(Q\) dimensions, the higher dimension of the two inputs, and it’s shape is identical to the input with higher dimension.

We abstract the formalization here and introduce the details as below:

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{K-1}] = A[a_0, a_1, \cdots, a_{M-1}] \text{ OP } B[b_0, a_1, \cdots, a_{N-1}], \\\end{split}\\\begin{split}\forall i \in [0, Q) \wedge d_i \in [0, \max(em_i, en_i)), \\\end{split}\\\begin{split}\text{where } a_j = d_{Q-M+j} \text{ if } d_{Q-M+j} < m_j \text{ else } 0, \forall j \in [0, M) \text{ and} \\ b_j = d_{Q-N+j} \text{ if } d_{Q-N+j} < n_j \text{ else } 0, \forall j \in [0, N) \text{ and} \\ em_i = \begin{cases} 1, & i < Q - M \\ m_{Q-M+i}, & \text{otherwise} \end{cases}, \forall i \in [0, Q) \text{ and} \\ en_i = \begin{cases} 1, & i < Q - N \\ n_{Q-N+i}, & \text{otherwise} \end{cases}, \forall i \in [0, Q)\end{split}\end{aligned}\end{align} \]

broadcast_add

set \(\text{OP}\) to \(\text{add}\).

Example

x = [[ 1.,  1.,  1.],
     [ 1.,  1.,  1.]]

y = [[ 0.],
     [ 1.]]

broadcast_add(x, y)
[[ 1.,  1.,  1.],
 [2.,  2.,  2.]]

broadcast_sub

set \(\text{OP}\) to \(\text{sub}\).

broadcast_mul

set \(\text{OP}\) to \(\text{mutiply}\).

broadcast_div

set \(\text{OP}\) to \(\text{divide}\).

broadcast_max

set \(\text{OP}\) to \(\text{max}\).

NN Operators

We provide NN operators for users. In fact, NN operators stand for neural network operators, the core of neural network learning mechanism. NN operators have parameters to be trained and logic for linear or non-linear transformation in a model graph.

conv2d

We only supported 2-D convolution operator. Also alias Group-wise Convolution.

Math Formalization

  • Input: there are 2 or 3 inputs.

    • \(X\), input data to be calculated whose shape is \((N, C, H, W)\)

    • \(W\), convolution kernel weight whose shape is \((OC, IC, KH, KW)\)

    • \(B\), bias, of type Optional<DLTensor>. If \(B\) is not None, it’s shape is \((\text{OC},)\).

  • Output: \(Y\)

  • Attributes:

    • padding, a TShape of length 2, namely \((PH, PW), PH,PW \in [min\_attr, max\_attr)\), indicating padding size.

    • stride, a TShape of length 2, namely \((SH, SW) \in [1, max\_attr)\), indicating strides.

    • dilation, a TShape of length 2, namely \((DH, DW) \in [1, max\_attr)\), parameter used in dilation convolution.

    • groups, an int in \(\text{range} [1, C]\), indicating group number. \(C = IC \cdot \text{groups} \wedge OC \text{ mod } \text{groups} = 0\)

\[ \begin{align}\begin{aligned}\begin{split}Y[n,oc,p,q]= \sum_{ic = 0}^{IC-1} \text{kernel}(n,(oc \div OPG) * IC + ic, p, q, oc,ic) + \begin{cases} 0, & \text{if B is None}\\ B[oc], & \text{otherwise} \end{cases}, \\\end{split}\\\forall n \in [0, N) \wedge oc\in [0, OC) \wedge p \in \left[0, \text{Y_HMAX} \right) \wedge q \in \left[0, \text{Y_WMAX} \right),\\\begin{split}\text{where } \text{Y_HMAX} = \left\lfloor{ H+2 \cdot \text{PH}-\text{DH} \cdot (\text{KH}-1)-1 \over \text{SH} }\right\rfloor + 1 \text{ and} \\ \text{Y_WMAX} = \left\lfloor{ W+2 \cdot \text{PW}-\text{DW} \cdot (\text{KW}-1)-1 \over \text{SW} }\right\rfloor + 1 \text{ and} \\ OPG = OC / \text{groups, } OPG \in \mathbb N^+ \text{ since } OC \text{ mod } \text{groups} = 0\\\end{split}\end{aligned}\end{align} \]

where \(\text{kernel}\) function does the 2D image convolution calculation, and the formulation is:

\[ \begin{align}\begin{aligned}\begin{split}\text{kernel}(n, j, p, q, o, i) = \sum_{k_i=0}^{\text{KH}} \sum_{k_j = 0}^{\text{KW}} \text{pad}(p'+k_i*\text{DH},q'+k_j*\text{DW}) \cdot W[o, i, k_i, k_j], \\\end{split}\\\begin{split}\text{where } p' = p \cdot \text{SH} -\text{PH} \text{ and } q' = q \cdot \text{SW}-\text{PW} \text{ and} \\ \text{pad}(p, q) = \begin{cases} X[n, j, p, q], & \text{ if } p \in [0, H) \wedge q \in [0, W) \\ 0, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]

dense

Dense operator provides a full connected layer.

Math Formalization

  • Input: there 2 or 3 inputs.

    • \(X\), a matrix of shape \((M, K)\)

    • \(W\), a matrix of shape \((N, K)\)

    • \(B\), bias, of type Optional<DLTensor>, If \(B\) is not NONE, it’s shape is \((N,)\).

  • Output: \(Y\), a matrix of shape \((M, N)\)

\[\begin{split}Y=X W^T + \begin{cases} 0, & \text{if B is None} \\ B, & \text{otherwise} \end{cases}\end{split}\]

relu

Relu performs elementwise rectified linear unit function.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{n-1})\)

  • Output: \(Y\), the same shape as \(X\)

\[\begin{split}Y[d_0, d_1, \cdots, d_{n-1}] = max(0, X[d_0, d_1, \cdots, d_{n-1}]), \\ \forall i \in [0, N) \wedge d_i \in [0, n_i)\end{split}\]

max_pool2d

Max_pool2d performs max pooling over every plane for each batch and channel.

Math Formalization

  • Input: \(X\), of shape \((N, C, H, W)\), indicating there are \(N\) batches, \(C\) channels and all the planes are of size \((H, W)\)

  • Output: \(Y\)

  • Attributes:

    • pool_size, a TShape of length 2, namely \((PSH, PSW)\)

    • padding, either a TShape of length 2, namely \((PH, PW) \in [min\_attr, max\_attr)\), or an int indicating \(PH=PW\)

    • strides, a TShape of length 2, namely \((SH, SW)\)

    • ceil_mode, boolean.

\[\begin{split}PSH \in [0, H + 2PH + 1), \\ PSW \in [0, W + 2PW + 1), \\ PSH > PH \wedge PSW > PW\end{split}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[n,i,p,q] = \max\{\text{pad}(n, i, p', q') \\ \mid p' \in [p \cdot \text{SH} -\text{PH}, p \cdot \text{SH} -\text{PH}+\text{PSH}), q' \in [q \cdot \text{SW}-\text{PW}, q \cdot \text{SW}-\text{PW}+\text{PSW})\}, \\\end{split}\\\begin{split}\forall n \in [0, N) \wedge i \in [0, C) \wedge \\ p \in \left[0, \text{ceil_func}\left({H+2 \cdot \text{PH}- \text{PSH}\over\text{SH}}\right)+1 \right) \wedge \\ q \in \left[0, \text{ceil_func}\left({W+2 \cdot \text{PW}- \text{PSW} \over \text{SW}}\right)+1 \right), \\\end{split}\\\begin{split}\text{where } \text{ceil_func(val)} = \begin{cases} \lceil \text{val} \rceil, & \text{if ceil_mode is true} \\ \lfloor \text{val} \rfloor, & \text{otherwise} \end{cases} \text{ and} \\ \text{pad}(n, i, p, q) = \begin{cases} X[n, i, p, q], & \text{ if } p \in [0, H) \wedge q \in [0, W) \\ INT32\_MIN, & \text{otherwise} \end{cases}\end{split}\end{aligned}\end{align} \]

upsampling

Upsampling operator performs upsampling to the input data by copying the value in each position serveral times.

Math Formalization

  • Input \(X\), whose shape is \((N, C, H, W)\)

  • Output \(Y\)

  • Attributes: scale, in range \([1, max\_attr)\).

\[\begin{split}Y[n, i, h, w] = X[n, i, \left\lfloor {h \over \text{scale}}\right\rfloor, \left\lfloor {w \over \text{scale}}\right\rfloor], \\ \forall n \in [0, N) \wedge i \in [0, C) \wedge h \in [0, H \cdot \text{scale}) \wedge w \in [0, W \cdot \text{scale})\end{split}\]

Elemwise Operators

An elemwise operator performs the elementwise function to input data and the process logic over all kinds of operators are alike. There might be 1 or 2 input tensors and the logic might be complicated for someone. That’s way we don’t abstract them but describe each one.

abs

This operator calculates absolute value of input data.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\), a tensor whose shape is same as \(X\)

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} x, & x \geqslant 0 \\ -x, & x < 0 \end{cases},\\\end{split}\\\begin{split}\forall i \in [0, N) \wedge d_i \in [0, n_i),\\\end{split}\\\text{where } x \text{ denotes } X[d_0, d_1, \cdots, d_{N-1}]\end{aligned}\end{align} \]

cvm_precision

The precision operator gives how many bits the absolute value of a number takes. 1 takes 1 bit. 2, 3 take 2 bits, etc. A special case is that 0 always takes at least 1 bit.

Math Formalization

  • Input \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output \(Y\), a tensor whose shape is same as \(X\)

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} \lceil log_2(abs(x)+1) \rceil, & x \neq 0\\ 1, & x = 0 \end{cases},\\\end{split}\\\begin{split}\forall i \in [0, N) \wedge d_i \in [0, n_i),\\\end{split}\\\text{where } x \text{ denotes } X[d_0, d_1, \cdots, d_{N-1}]\end{aligned}\end{align} \]

elemwise_add

This operator performs elementwise add to the 2 input tensors.

Math Formalization

  • Input: there are 2 inputs, of the same shape.

    • \(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

    • \(B\), whose shape is same as \(A\).

  • Output: \(Y\), a tensor whose shape is same as \(A\) and \(B\).

\[Y = A + B\]

elemwise_sub

This operator performs elementwise subtraction to the 2 input tensors.

Math Formalization

  • Input: there are 2 inputs, of the same shape.

    • \(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

    • \(B\), whose shape is same as \(A\).

  • Output: \(Y\), a tensor whose shape is same as \(A\) and \(B\).

\[Y = A - B\]

negative

This operator performs elementwise negative to the input tensor.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\), a tensor whose shape is same as \(X\).

\[Y = -X\]

clip

This operator performs clip, cutting the data into a range, to the input tensor.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\), a tensor whose shape is same as \(X\).

  • Attributes:

    • a_min

    • a_max

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} \text{a_max}, & x \geqslant \text{a_max} \\ x, & x \in (\text{a_min}, \text{a_max}) \\ \text{a_min}, & x \leqslant \text{a_min} \end{cases},\\\end{split}\\\begin{split}\forall i \in [0, N) \wedge d_i \in [0, n_i), \\\end{split}\\\text{where } x \text{ denotes } X[d_0, d_1, \cdots, d_{N-1}]\end{aligned}\end{align} \]

cvm_cilp

This operator clips the input data into a certain CVM precision.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\), a tensor whose shape is same as \(X\).

  • Attribute: precision, an int in range \([1, 33)\)

\[\begin{split}Y = clip(X, \text{a_min}=-\alpha, \text{a_max}=\alpha), \\ \text{where } \alpha = 2^\text{precision-1}-1\end{split}\]

cvm_right_shift

This operator performs right shift. Slightly different from C right shift, the result of this operator would be rounded to nearest integer. A special case is that negative half number will be rounded up, -1.5 rounded to -1 for example.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\), a tensor whose shape is same as \(X\).

  • Attribute:

    • precision, an int in range \([1, 33)\)

    • shift_bit, an int in range \([1, 33)\)

\[\begin{split}Y = clip(T, \text{a_min} = -\alpha, \text{a_max}=\alpha), \\ \text{where } T = {\left\lfloor \left(\left\lfloor \frac{X}{2^{\text{shift_bit} - 1}} \right\rfloor + 1 \right) \div 2 \right\rfloor} \text{ and } \alpha = 2 ^ {\text{precision} - 1} - 1\end{split}\]

cvm_left_shift

This operator performs left shift to the input tensor, same as C left shift operator.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\), a tensor whose shape is same as \(X\).

  • Attribute:

    • precision, an int in range \([1, 33)\)

    • shift_bit, an int in range \([1, 33)\)

\[\begin{split}Y = clip(T, \text{a_min} = -\alpha, \text{a_max}=\alpha), \\ \text{where } T = X * 2^\text{shift_bit} \text{ and } \alpha = 2 ^ {\text{precision} - 1} - 1\end{split}\]

Transform Operators

transform operator do not do the calculation on the data but simply reshape, copy or select part of it. The process logic over all kinds of operators are quite different.

repeat

This operator repeats the input data by repeats times along the given axis. Each element is repeated right after itself.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{\text{axis}}, \cdots, n_{N-1})\)

  • Output: \(Y\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{\text{axis}} \cdot repeats, \cdots, n_{N-1})\)

  • Attribute:

    • axis, an int in range \([0, N)\), indicating on which axis is the repeatation performed.

    • repeats, an int in range \([1, +\infty)\), indicating how many times the data is repeated.

\[\begin{split}Y[d_0, d_1, \cdots, d_\text{axis}, \cdots, d_{N-1}] = X[d_0, d_1, \cdots, \left\lfloor{d_\text{axis} \over \text{repeats}}\right\rfloor, \cdots, d_{N-1}], \\ \forall d_0 \in [0, n_0) \wedge \cdots \wedge d_{axis-1} \in [0, n_{axis-1}) \wedge d_{axis} \in [0, n_{axis} \cdot \text{repeats}) \wedge \\ d_{axis+1} \in [0, n_{axis+1}) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1})\end{split}\]

tile

This operator tiles the input data serveral times on serveral dimensions. Different from Repeat operator repeating each element right after itself, this operator tiles the whole data after the whole data.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),

  • Output: \(Y\)

  • Attribute: reps, a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M-1})\).

\[r \in [1, max\_attr), \forall r \in \text{reps}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[k_0, \cdots, k_{K-N-1}, k_{K-N}, k_{K-N+1}, \cdots, k_{K-1}] = \\ X[k_{K-N+0} \text{ mod } n_0, k_{K-N+1} \text{ mod } n_1, \cdots, k_{K-N+N-1} \text{ mod } n_{N-1}], \\ \forall k_0 \in [0, S_0) \wedge \cdots \wedge k_{K-1} \in [0, S_{K-1}), \\\end{split}\\\begin{split}\text{where } K = \max\{M, N\} \text{ and } S_i = SX_i \cdot SR_i \text{ and} \\ SX_p = \begin{cases} n_{p-K+N}, & p \in [K-N, K-1) \\ 1, & p \in [0, K-N) \end{cases} \text{ and} \\ SR_q = \begin{cases} m_{q-K+M}, & q \in [K-M, K-1) \\ 1, & q \in [0, K-M) \end{cases}\end{split}\end{aligned}\end{align} \]

flatten

This operator flattens the input tensor data to an array in a row-major order.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\).

  • Output: \(Y\).

\[\begin{split}Y[\text{flatten_index}(d_0, d_1, \cdots, d_{N-1}, n_0, n_1, \cdots, n_{N-1})] = \\ X[d_0, d_1, \cdots, d_{N-1}], \\ \forall d_0 \in [0, n_0) \wedge d_1 \in [0, n_1) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1})\end{split}\]

where \(\text{flatten_index}\) is

\[\begin{split}\text{flatten_index}(d_0, d_1, \cdots, d_{N-1}, n_0, n_1, \cdots, n_{N-1}) = \\ d_0 \cdot \prod_{i = 1}^{N-1} n_i + d_1 \cdot \prod_{i = 2}^{N-1} n_i + \cdots + d_{N-2} * n_{N-1} + d_{N-1}\end{split}\]

concatenate

This operator concatenates tensors along a given axis.

Math Formalization

  • Inputs: \(M\) tensors, namely \(I^0, I^1, \cdots, I^{M-1}\). They all have \(N\) dimensions, namely \(I^i\)’s shape is \((n^i_0, n^i_1, \cdots, n^i_{N-1})\). All dimensions except the axis to be concatenated along must have the same length.

  • Output: \(Y\)

  • Attribute: axis, an int in range \([0, N)\).

\[n^i_j = n^0_j, \forall i \in [1, M) \wedge j \in [0, N) \wedge j \neq \text{axis}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_\text{axis-1}, \text{new_idx}, d_\text{axis+1}, \cdots, d_{N-1}] = I^i[d_0, d_1, \cdots, d_{N-1}], \\ \forall d_0 \in [0, n^i_0) \wedge \cdots \wedge d_{N-1} \in [0, n^i_{N-1}) \wedge i \in [0, M), \\\end{split}\\\text{where new_idx} = \sum_{j=0}^{i-1} n^j_\text{axis} + d_\text{axis}\end{aligned}\end{align} \]

transpose

This operator transposes the input data with a certain sequence.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

  • Output: \(Y\),

  • Attribute: axes, a TShape of length \(M \in \{0, N\}\), which means axes is either empty or a permutation of \(\{0, 1, \cdots, N-1\}\)

\[\text{axis} \in [-N, N), \forall \text{axis} \in \text{axes}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[d_{\text{real_axes}_0}, d_{\text{real_axes}_1}, \cdots, d_{\text{real_axes}_{N-1}}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall d_0 \in [0, n_0) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1}), \\\end{split}\\\begin{split}\text{where real_axes}_i = \begin{cases} \text{axes}_i, & M = N \wedge \text{axes}_i \geqslant 0 \\ \text{axes}_i + N, & M = N \wedge \text{axes}_i < 0 \\ N-1-i, & M = 0 \end{cases} \text{ and} \\ card \; \{\text{real_axes}_i \mid i \in [0, N) \} = N\end{split}\end{aligned}\end{align} \]

slice

This operator slices an input array with given attribute. For each dimension, strides (default to be 1), start (default to be 0) and end (default to be length of this dimension) coordinates can be assigned by user.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

  • Output: \(Y\),

  • Attributes:

    • begin, \(B\) dimensions.

    • end, \(E\) dimensions.

    • strides: \(S\) dimensions.

    \(B\), \(E\), \(S\) can be different.

\[\begin{split}\text{b_arr}[b] = \begin{cases} \text{begin}[b] + n_i, & b \in [0, N) \wedge b < B \wedge begin[b] < 0 \\ \text{begin}[b], & b \in [0, N) \wedge b < B \wedge begin[b] \geqslant 0 \\ 0, & b \in [0, N) \wedge b \geqslant B \end{cases}, b \in [0, N) \\\end{split}\]
\[\begin{split}\text{e_arr}[e] = \begin{cases} \text{end}[e] + n_i, & e \in [0, N) \wedge e < E \wedge end[e] < 0\\ \text{end}[e], & e \in [0, N) \wedge e < E \wedge end[e] \geqslant 0\\ n_{e}, & e \in [0, N) \wedge e \geqslant E \end{cases}, e \in [0, N) \\\end{split}\]
\[\begin{split}\text{s_arr}[s] = \begin{cases} \text{stride}[s], & s \in [0, N) \wedge s < S \\ 1, & s \in [0, N) \wedge s \geqslant S \end{cases}, s \in [0, N) \\\end{split}\]
\[\begin{split}\forall \{i \mid i \in [0, N)\}: \text{s_arr}[i] \ne 0 \\\end{split}\]
\[\begin{split}\text{b_range}(i) = \begin{cases} -1, & \text{s_arr}[i] < 0 \\ 0, & \text{s_arr}[i] \geqslant 0 \end{cases} \\\end{split}\]
\[\begin{split}\text{e_range}(i) = \begin{cases} n_i - 1, & \text{s_arr}[i] < 0 \\ n_i, & \text{s_arr}[i] \geqslant 0 \end{cases} \\\end{split}\]
\[\begin{split}\text{b_vec}[b] = clip(\text{b_arr}[b], \text{a_min}=\text{b_range}(b), \text{a_max}=\text{e_range}(b)-1), b \in [0, N) \\\end{split}\]
\[\begin{split}\text{e_vec}[e] = clip(\text{e_arr}[e], \text{a_min}=\text{b_range}(e), \text{a_max}=\text{e_range}(e)-1), e \in [0, N) \\\end{split}\]
\[\begin{split}\forall \{i \mid i \in [0, N) \}: \begin{cases} \text{b_vec}[i] < \text{e_vec}[i], & \text{s_arr}[i] > 0 \\ \text{e_vec}[i] < \text{b_vec}[i], & \text{s_arr}[i] < 0 \end{cases} \\\end{split}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = X[K[0], K[1], \cdots, K[N-1]], \\\end{split}\\\forall d_j \in [0, \left\lceil{ \text{e_vec}[j] - \text{b_vec}[j] \over \text{s_arr}[j] }\right\rceil ) \wedge j \in [0, N),\\\text{where } K[i] = \text{b_vec}[i] + \text{s_arr}[i] * d_i\end{aligned}\end{align} \]

slice_like

This operator slices the input \(X\) to a shape that looks like the other given input shape_like.

TODO: need more consideration.

Math Formalization

  • Input: there are 2 inputs

    • \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

    • \(\text{shape_like}\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1 })\),

  • Output: \(Y\)

  • Attribute: axes, a TShape with length \(K\). If axes is not empty, only those axes mentioned will be sliced on and others in shape_like will also be ignored. \(M \ne N\) is allowed only if non-empty axes are given. If \(M>N\), those dimensions higher than \(N\) will be ignored and if \(M<N\), only the first \(M\) dimensions are sliced while those dimensions higher than \(M\) will stay the same.

\[ \begin{align}\begin{aligned}\begin{split}\text{sliced_axes} = \begin{cases} \{j \mid j \in axes \wedge j \geqslant 0\} \bigcup \{j + N \mid j \in axes \wedge j < 0\}, & K > 0\\ \{0, 1, \cdots, M-1\}, & K = 0 \end{cases}, \\\end{split}\\\begin{split}\text{where } \forall j \in \text{sliced_axes}: j < \min(M, N) \text{ and } m_j \leqslant n_j\\\end{split}\end{aligned}\end{align} \]
\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall j \in [0, N) \wedge d_j \in \begin{cases} [0, m_j), & j \in \text{sliced_axes} \\ [0, n_j), & j \notin \text{sliced_axes} \end{cases},\\\end{split}\end{aligned}\end{align} \]

take

This operator takes some elements from the input data. If axis is not given, it flattens the input data and takes elements; if axis is given, takes the input data on this axis for every coordinates in other axes.

Math Formalization

  • Input: there 2 inputs

    • \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

    • \(indices\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1})\)

  • Output: \(Y\),

  • Attribute: axis an Optional<int>.

  1. Case axis is None :

\[ \begin{align}\begin{aligned}\begin{split}T = flatten(X) \\ Y[d_0, d_1, \cdots, d_{M-1}] = T[clip(\text{xidx}, \text{a_min}=0, \text{a_max}=|T|-1)],\\\end{split}\\\begin{split}\forall (d_0, d_1, \cdots, d_{M-1}), \\\end{split}\\\begin{split}\text{ where } d_j \in [0, m_j) \wedge j \in [0, M) \text{ and }\\ \text{xidx} = \text{indices}[d_0, d_1, \cdots, d_{M-1}]\end{split}\end{aligned}\end{align} \]
  1. Case axis is int:

\[ \begin{align}\begin{aligned}\begin{split}\text{axis} \in [-N, N) \\ \text{real_axis} = \begin{cases} \text{axis}, & \text{axis} \geqslant 0 \\ \text{axis} + N, & \text{axis} < 0 \end{cases} \\ Y[d_0, d_1, \cdots, d_{M+N-2}] = X[d_0, \cdots, d_{\text{real_axis}-1}, \text{xdix}, d_{\text{real_axis}+M}, \cdots, d_{M+N-2}], \\\end{split}\\\begin{split}\forall d_j \in \begin{cases} [0, n_j), & j < \text{real_axis} \\ [0, m_{j-\text{real_axis}}), & j \in [\text{real_axis}, \text{real_axis}+M) \\ [0, n_{j-M+1}), & j \in [\text{real_axis} + M, M+N-1) \end{cases},\\ \text{where } \text{xidx}{} = clip(\text{indices}[d_{\text{real_axis}}, d_{\text{real_axis}+1}, \cdots, d_{\text{real_axis}+M-1}],\\ \text{a_min}=0, \text{a_max}=n_{\text{real_axis}}-1)\end{split}\end{aligned}\end{align} \]

cvm_lut

This operator is a alias for a take where axis is None.

Math Formalization

  • Input: there are 2 inputs.

    • \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

    • \(indices\), a tensor of \(M\) dimensions, namely \((m_0, m_1, \cdots, m_{M- 1})\)

  • Output: \(Y\).

\[Y=take(X, \text{indices}, \text{axis}=\text{None})\]

expand_dims

This operator expands some new dimensions right from the given axis.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),

  • Output: \(Y\)

  • Attributes:

    • axis, an int in range \([-N-1, N+1)\), indicating where the new dimensions starts. Note that \(N+1\), instead of \(N\), will be added to negative axis.

    • num_newaxis, an int in range \([min\_attr, max\_attr)\)

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0,d_1, \cdots, d_{\text{real_axis}-1}, \underbrace{0, 0, \cdots, 0}_{\text{num_newaxis}}, d_\text{real_axis}, \cdots, d_{N-1}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall d_0 \in [0, n_0) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1}), \\\end{split}\\\begin{split}\text{where real_axis} = \begin{cases} \text{axis},& \text{axis} \geqslant 0 \\ \text{axis} + N,& \text{axis} < 0 \end{cases}\end{split}\end{aligned}\end{align} \]

reshape

This operator reshapes the input data.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\),

  • Output: \(Y\),

  • Attribute: target_shape, a TShape of length M, namely \((m_0, m_1, \cdots, m_{M-1})\) , s.t. \(m_0 * m_1 * \cdots * m_{M-1} = n_0 * n_1 * \cdots * n_{N-1}\).

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{M-1}] = T[\text{flatten_index}(d_0, d_1, \cdots, d_{M-1}, m_0, m_1, \cdots, m_{N-1})], \\\end{split}\\\begin{split}\forall d_0 \in [0, m_0) \wedge \cdots \wedge d_{N-1} \in [0, m_{N-1}), \\\end{split}\\\text{where } T = \text{flatten}(X)\end{aligned}\end{align} \]

squeeze

This operator removes dimensions of length 1.

Math Formalization

  • Input: \(X\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

  • Output: \(Y\).

  • Attribute: axes, a TShape of length M.

\[\text{axis} \in [-N, N), \forall \text{axis} \in \text{axes}\]
\[\begin{split}\text{real_axes} = \begin{cases} \{\text{axis} \mid \text{axis} \geqslant 0 \wedge \text{axis} \in \text{axes} \} \bigcup \{\text{axis} + N \mid \text{axis} < 0 \wedge \text{axis} \in \text{axis}\}, & M > 0 \\ \{\text{axis} \mid n_\text{axis} = 1 \wedge \text{axis} \in [0, N) \}, & M = 0 \end{cases} \\\end{split}\]
\[n_\text{axis} = 1, \forall \text{axis} \in \text{real_axis}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[d_{I(0)}, d_{I(1)}, \cdots, d_{I(N-K-1)}] = X[d_0, d_1, \cdots, d_{N-1}], \\\end{split}\\\begin{split}\forall d_0 \in [0, n_0) \wedge d_1 \in [0, n_1) \wedge \cdots \wedge d_{N-1} \in [0, n_{N-1}), \\\end{split}\\\begin{split}\text{where } K = card \; \text{\{real_axes\}} \text{ and} \\\end{split}\\\begin{split}I: \{i \mid i \in [0, N-K) \} \to \{i \mid i \in [0, N) \wedge i \notin \text{real_axes} \}, \\ \text{s.t. } I(i) < I(j), \forall 0 \leqslant i < j < N-K\end{split}\end{aligned}\end{align} \]

where

This operator selects data from 2 inputs with condition given.

Math Formalization

  • Input: there are 3 inputs

    • \(Cond\), either a tensor whose shape is same as others, or a 1d tensor whose length is same as the length of others’ first dimension.

    • \(A\), a tensor of \(N\) dimensions, namely \((n_0, n_1, \cdots, n_{N-1})\)

    • \(B\), a tensor whose shape is same as \(A\)

  • Output: \(Y\)

  1. Case shape of \(Cond\) is same as others:

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} A[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0, d_1, \cdots, d_{N-1}] \neq 0\\ B[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0, d_1, \cdots, d_{N-1}] = 0\end{cases},\\\end{split}\\\forall d_i \in [0, n_i) \wedge i \in [0, N)\end{aligned}\end{align} \]
  1. Case \(Cond\) is a 1d tensor:

\[ \begin{align}\begin{aligned}\begin{split}Y[d_0, d_1, \cdots, d_{N-1}] = \begin{cases} A[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0] \neq 0\\ B[d_0, d_1, \cdots, d_{N-1}], & Cond[d_0] = 0 \end{cases},\\\end{split}\\\forall d_i \in [0, n_i) \wedge i \in [0, N)\end{aligned}\end{align} \]

Vision Operators

We provide 2 operators for vision scenario. Since the scenario is narrow, regulation of the input data is stricter than other operators. If there’s no other specification, the input data should be 3D, namely \((B, N, K)\), indicating number of batches, number of result for each batch and the length (\(K\)) of a result. The first value should be ID and the second should be the score.

get_valid_count

This operator counts how many results are more than a threshold and what are they.

Math Formalization

  • Input: \(X\), a 3D tensor of shape \((B, N, K), 2 \leqslant K \leqslant 32\)

  • Output:

    • \(\text{valid_count}\),

    • \(Y\),

  • Attribute: score_threshold, an int.

\[\begin{split}\text{valid_count}[b] = card\{ q \mid q \in [0, N) \wedge X[b, q, 1] > \text{score_threshold} \}, \\ \quad \forall b \in [0, B)\end{split}\]
\[ \begin{align}\begin{aligned}\begin{split}Y[b, \text{idx}, k] = X[b, n, k], \\ \quad \forall b \in [0, B) \wedge n \in [0, N) \wedge k \in [0, K) \wedge X[b, n, 1] > \text{score_threshold}, \\\end{split}\\\quad \text{where idx = } card \{q \mid q \in [0, n) \wedge X[b, q, 1] > \text{score_threshold} \}\end{aligned}\end{align} \]
\[Y[b,n, k] = -1, \forall b \in [0, B) \wedge n \in [\text{valid_count}[b], N) \wedge k \in [0, K)\]

non_max_suppression

This operator implements the nms algorithm, finding valid bounding boxes.

Math Formalization

  • Input: there are 2 inputs.

    • \(X\), a 3D tensor of shape \((B, N, K), K = 6\)

    • valid_count, a 1D tensor of length \(B\)

  • Output: \(Y\)

  • Attributes:

    • iou_threshold, an int, representing the percentage of intersection over union with value in range \((0, +\infty)\) where 101 stands for bounding box full-overlap specifically and larger integer is equivalent to that.

    • max_output_size, an int

    • force_suppress, a boolean

    • top_k, an int.

\[ \begin{align}\begin{aligned}\begin{split}R[b, i, k] = X[b, I(i), k], \\\end{split}\\\begin{split}\forall b \in [0, B) \wedge i \in [0, T) \wedge k \in [0, K), \\\end{split}\\\begin{split}\text{where } T = \text{max}\{ \text{min}(N, \text{valid_count}[b]), 0\} \text{ and} \\ I: \{ i \mid i \in [0, T) \} \to \{ i \mid i \in [0, T) \}, \\ \text {s.t. } X[b, I(i), 1] > X[b, I(j), 1] \text{ or } \\ (X[b, I(i), 1] = X[b, I(j), 1] \wedge I(i) < I(j)), \forall 0 \leqslant i < j < T\end{split}\end{aligned}\end{align} \]
\[ \begin{align}\begin{aligned}\begin{split}Y[b, n, k] = R[b, \text{IDX}(n), k], \\\end{split}\\\begin{split}\forall b \in [0, B) \wedge n \in [0, \min\{T, \text{MOS}, card\{U\}\}) \wedge k \in [0, K), \\\end{split}\\\begin{split}\text{where } \text{TK} = \begin{cases} +\infty, & \text{if top_k < 0} \\[1ex] \text{top_k}, & \text{otherwise} \end{cases} \text{ and} \\ \text{MOS} = \begin{cases} +\infty, & \text{if max_output_size < 0} \\[1ex] \text{max_output_size}, & \text{otherwise} \end{cases} \text{ and} \\ \text{iou}(p, q) = \begin{cases} \text{OLR}(R[b, p, :], R[b, q, :]), & \begin{array}{} \text{force_suppress is true}\\ \text{ or } R[b, p, 0] = R[b, q, 0] \end{array} \\[1ex] 0, & \text{otherwise} \end{cases} \text{ and} \\ \text{U} = \{p \mid p \in [0, min\{TK, T\}) \wedge R[b,p,0] >= 0 \wedge \\ \text{iou}(p, q) < \text{iou_threshold}, \forall q \in U \wedge q < p\} \text{ and} \\ \text{IDX}: \{i \mid i \in [0, card\{U\})\} \to U, \text{s.t. } \text{IDX}(i) < \text{IDX}(j), \forall i < j\end{split}\end{aligned}\end{align} \]
\[\begin{split}Y[b, n, k] = -1, \\ \forall b \in [0, B) \wedge k \in [0, K) \wedge n \in [min\{T, \text{MOS},card\{U\}\}, N)\end{split}\]

Note

\(\text{OLR}\), which means overlap ratio, calculates the overlap ratio of two rectangles: area of their intersection over the area of their union.