System Architecture¶
Contents
Integer Inference¶
All the inference data flow is pure integer in Graph Runtime.
As the Graph Runtime points out, we will alloc the necessary
memory space and prepare packed functions in order before
real inference. In the memory space allocment, we invoke the
NDArray Module API to create and hold an memory manager(that
is the NDArray
instance or the internal DLTensor*
pointer)
with reference count.
The inference have the two basic design idea:
Parameters stored in disk must be
INT
type.Memory space that inference need must be
INT32
type.
First, we will assert all the parameters loaded, data input and
pre-alloced memory that operators need, are the INT
type.
Here are some reference code:
auto dtype = data_in->dtype;
VERIFY((dtype.code == kDLInt) &&
((dtype.bits == 32) || (dtype.bits == 8)) &&
(dtype.lanes == 1))
<< "cvm runtime only supported INT8 or INT32 NDArray, but ("
<< dtype.code << ", " << dtype.bits << ", "
<< dtype.lanes << ")";
Moreover, the operators’ forward funtions assume the input data’s
type as the INT32
type and then do specific process logic
corresponding with the Operator Math Formalization
definition. So there are some places to convert the INT
type
to INT32
type, here are some reference code:
NDArray nd_in(reinterpret_cast<NDArray::Container*>(data_in));
if (data_in->dtype.bits == 8) {
NDArray nd32 = NDArray::Empty(
std::vector<int64_t>(dshp, dshp+ndim),
DLDataType{.code=kDLInt, .bits=32, .lanes=1},
ctx);
int8_t *data8 = static_cast<int8_t*>(data_in->data);
int32_t *data32 = static_cast<int32_t*>(nd32->data);
int64_t num_elems = 1;
for (int i = 0; i < data_in->ndim; ++i)
num_elems *= data_in->shape[i];
for (int i = 0; i < num_elems; i++)
data32[i] = static_cast<int32_t>(data8[i]);
nd_in.swap(nd32);
}
More detail source code refers to the SetInput
function
in file: src/runtime/graph_runtime.cc
please.
And the operators’ forward funtions get the neccessary
DLTensor*
pointer to convert to INT32
type pointer
based on the assumption
that verify above when processing the mathmatical logic.
Assertion and Log¶
source code:
include/utils/logging.h
.
The assertion do runtime checks and may generate two exceptions,
which respectively stands for runtime error inherited from
std::runtime_error
and logic error inherited from
std::logic_error
.
The corresponding exception classes declared refer to
utils::LogMessageFatal
and
utils::ValueVerifyFatal
for more details.
Developer need to assert conditions with pre-defined macros,
such as CHECK()
and VERIFY()
. CHECK()
macro will throw runtime exception and VERIFY()
will
throw logic error. A example usage likes this:
CHECK(condition) << "error information";
VERIFY(condition) << "error information";
Now it’s important to understand the difference between the two exceptions. One should know the cvm-runtime project is intergral to the CVM in Cortex Foundation’s full-node: CortexTheasus. A inference call in the cortex blockchain will cost the Endophin, A calculation unit for model inference takes up, including memory, time-spending, etc. And then according to the Endophin cost, the logic error will consume the invoker’s CTXC token even if the inference fails, whereas the runtime error won’t.
Briefly, a logic error is caused by model supplier or invoker usually, so it’s user’s responsibility to take the failure. And the generic situation that a runtime error occurs is out of source code bug.
And one another noticable thing is that cvm-runtime uses exception to record errors, and it’s a big offense to segement fault or dump. Try your best to avoid core dump and use CHECK macro to check if you are uncertain to some conditions.