Inference¶
The inference plugin performs detection and classification using deep neural networks. Its interface simplifies interacting with various cross-platform inference engines. Supported engines are contained in their own plugin that register their capabilities with the system.
Engine selection¶
Engines can be explicitly specified by using a URI scheme in the following format:
scheme://./models/mymodel.ext
Alternatively the file's extension will be used to automatically select the backend
Model configuration¶
Each AI model requires an accompanying .json
file to specify loading parameters. This configuration
is imported directly into the inference plugin's State Dictionary and is directly accessible from the Managed or Lua interface.
Besides the engine specific parameters you are free to add user specific data.
Example
mymodel.onnx.json
{
"channel_layout": "NCHW",
"normalize_input": false,
"transpose_input": false,
"mean": 0,
"stddev": 0
}
Plugin configuration¶
The main configuration for the inference plugin deals with the filtering of detector and classifiers results irrespective of the inference engine used.
Plugin configuration
{
"<plugin instance name>": {
"config": {
"class_ids": [
1
],
"conf_threshold": 0.5,
"enabled": true,
"filter_edge_detections": false,
"nms_eta": 1.0,
"nms_iou_threshold": 0.5,
"nms_merge_batches": false,
"nms_score_threshold": 0.1,
"nms_topk": 0.0
}
}
}
auto inference = new Inference();
inference->pluginConf.class_ids.push_back(1);
inference->pluginConf.conf_threshold = 0.5;
inference->pluginConf.enabled = true;
inference->pluginConf.filter_edge_detections = true;
inference->pluginConf.nms_eta = 1.0;
inference->pluginConf.nms_iou_threshold = 0.5;
inference->pluginConf.nms_merge_batches = false;
inference->pluginConf.nms_score_threshold = 0.1;
inference->pluginConf.nms_topk = 0.0;
inference = std::reinterpret_pointer_cast<InferenceManaged>(rtapi->loadPlugin("Inference", "MyPlugin", ""));
auto conf = inference->getConfig();
conf->setValue("conf_threshold", std::make_shared<cvediart::CValue>(0.5));
conf->setValue("enabled", std::make_shared<cvediart::CValue>(true));
conf->setValue("filter_edge_detections", std::make_shared<cvediart::CValue>(true));
conf->setValue("nms_eta", std::make_shared<cvediart::CValue>(1.0));
conf->setValue("nms_iou_threshold", std::make_shared<cvediart::CValue>(0.5));
conf->setValue("nms_merge_batches", std::make_shared<cvediart::CValue>(false));
conf->setValue("nms_score_threshold", std::make_shared<cvediart::CValue>(0.1));
conf->setValue("nms_topk", std::make_shared<cvediart::CValue>(0.0));
---@param self inferencemanaged
function Classifier.onInit(self)
local conf = self:getConfig()
conf.class_ids = [1];
conf.conf_threshold = 0.5;
conf.enabled = true;
conf.filter_edge_detections = true;
conf.nms_eta = 1.0;
conf.nms_iou_threshold = 0.5;
conf.nms_merge_batches = false;
conf.nms_score_threshold = 0.1;
conf.nms_topk = 0.0;
end
Supported engines¶
ONNX Runtime¶
Scheme: onnx://
Extensions: .onnx
ONNX is a standard model format and does not provide an engine of its own. However its runtime library supports a wide-range of backends to run on making it a very flexible format for deployment.
The currently supported backends are : CPU
, CUDA
, DML
, TRT
.
Warning
Using TensorRT as a backend will cause the system to compile the ONNX file into a native TensorRT model.
This can take up to 30 minutes on edge devices. Make sure the trt_cache_path
is set and trt_cache_enabled
is true
.
Configuration¶
ONNX at a minimum requires the following parameters to be present for each AI model:
{
"channel_layout": "NCHW",
"normalize_input": false,
"transpose_input": false,
"mean": 0,
"stddev": 0
}
Depending on how the ONNX plugin is used it can be configured in different ways:
Plugin configuration
"<plugin instance name>": {
"config": {
"ONNX": {
"backend": "CUDA",
"device_id": 0,
"execution_mode": "PARALLEL",
"profiling": false,
"profiling_prefix": "",
"trt_cache_enabled": true,
"trt_cache_path": "",
"ort_intraop_num_threads": 8,
"ort_interop_num_threads": 8
}
}
}
auto onnx = new ONNX();
onnx->pluginConf.backend = "CPU";
onnx->pluginConf.execution_mode = "PARALLEL";
onnx->pluginConf.profiling = false;
onnx->pluginConf.profiling_prefix = "";
onnx->pluginConf.device_id = 0;
onnx->pluginConf.trt_cache_path = "";
onnx->pluginConf.trt_cache_enabled = true;
onnx->pluginConf.ort_intraop_num_threads = 8;
onnx->pluginConf.ort_interop_num_threads = 8;
using namespace cvediart::api::SAPI;
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/backend", "\"CPU\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/execution_mode", "\"PARALLEL\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/profiling", "false");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/profiling_prefix", "\"\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/device_id", "0");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/trt_cache_path", "\"\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/trt_cache_enabled", "true");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/ort_intraop_num_threads", "8");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/ort_interop_num_threads", "8");
---@param self inferencemanaged
function Classifier.onInit(self)
local conf = self:getConfig()
conf.ONNX.backend = "CPU";
conf.ONNX.execution_mode = "PARALLEL";
conf.ONNX.profiling = false;
conf.ONNX.profiling_prefix = "";
conf.ONNX.device_id = 0;
conf.ONNX.trt_cache_path = "";
conf.ONNX.trt_cache_enabled = true;
conf.ONNX.ort_intraop_num_threads = 8;
conf.ONNX.ort_interop_num_threads = 8;
end
Usage¶
The ONNX engine can be used directly as a core interface for low-level embedding in existing C++ projects.
Usage
#include "Plugins/ONNX/include/onnx.h"
using namespace cvediart::plugin::inference;
void example() {
cv::Mat img, img_resized;
// Load ONNX backend
onnx = std::make_unique<ONNX>();
// Create inference plugin
inference = std::make_unique<Inference>();
// Assign backend to plugin
inference->setBackend(onnx);
inference->pluginConf.class_ids.push_back(1);
inference->loadModel("models/mymodel.onnx");
// Load image and resize to network dimensions
img = cv::imread("assets/tests/images/2_people.jpg");
cv::resize(img, img_resized, cv::Size(inference->inputWidth_, inference->inputHeight_));
std::vector<cv::Mat> mats;
mats.push_back(img_resized);
auto detections = inference->runDetectorInference(mats).value();
inference->detsToAbsCoords(detections, std::vector<cv::Rect>{cv::Rect(cv::Point(0, 0), img.size())}, false, true);
detections = inference->NMS(detections);
}
TensortRT¶
Registered handlers:
Scheme: tensorrt://
Extensions: .plan
.engine
TensorRT is the prefered engine for deployment on any device using NVIDIA acceleration. This includes the Jetson Nano , Jetson Xavier AGX and Jetson Xavier NX as well as any server using Quadro or RTX/GTX videocards.
Note
Converting an AI model to TensorRT format requires compilation on a system that has the exact same versions of CUDA
and
Compute capability
as the target system. Conversion is done through the trt_exec
executable which is part of TensorRT.
Configuration¶
TensorRT at a minimum requires the following parameters to be present for each AI model:
{
"channel_layout": "",
"mean": 0,
"stddev": 0,
"normalize_input": false,
"transpose_input": false,
"input_binding": "",
"output_binding": [
"Bounding Boxes",
"Classification"
]
}
Depending on how the TensorRT plugin is used it can be configured in different ways:
Plugin configuration
"<plugin instance name>": {
"config": {
"TRT": {
"device_id": 0
}
}
}
auto trt = new TRT();
trt->pluginConf.device_id = 0;
using namespace cvediart::api::SAPI;
Server::setStateDictEntry(instanceName, "<plugin name>/config/TRT/device_id_", "0");
---@param self inferencemanaged
function Classifier.onInit(self)
local conf = self:getConfig()
conf.TRT.device_id = 0;
end
Usage¶
The TensorRT engine can be used directly as a core interface for low-level embedding in existing C++ projects.
Usage
#include "Plugins/TensorRT/include/trt.h"
using namespace cvediart::plugin::inference;
void example() {
cv::Mat img, img_resized;
// Load TensorRT backend
auto trt = std::make_unique<TRT>();
// Create inference plugin
inference = std::make_unique<Inference>();
// Assign backend to plugin
inference->setBackend(trt);
inference->pluginConf.class_ids.push_back(1);
inference->loadModel("models/mymodel.engine");
// Load image and resize to network dimensions
img = cv::imread("assets/tests/images/2_people.jpg");
cv::resize(img, img_resized, cv::Size(inference->inputWidth_, inference->inputHeight_));
std::vector<cv::Mat> mats;
mats.push_back(img_resized);
auto detections = inference->runDetectorInference(mats).value();
inference->detsToAbsCoords(detections, std::vector<cv::Rect>{cv::Rect(cv::Point(0, 0), img.size())}, false, true);
detections = inference->NMS(detections);
}
Amba CVFlow¶
Not yet implemented
OpenVINO¶
Not yet implemented