Skip to content

Inference

The inference plugin performs detection and classification using deep neural networks. Its interface simplifies interacting with various cross-platform inference engines. Supported engines are contained in their own plugin that register their capabilities with the system.

Engine selection

Engines can be explicitly specified by using a URI scheme in the following format:

scheme://./models/mymodel.ext

Alternatively the file's extension will be used to automatically select the backend

Model configuration

Each AI model requires an accompanying .json file to specify loading parameters. This configuration is imported directly into the inference plugin's State Dictionary and is directly accessible from the Managed or Lua interface. Besides the engine specific parameters you are free to add user specific data.

Example

mymodel.onnx.json

{
  "channel_layout": "NCHW",
  "normalize_input": false,
  "transpose_input": false,
  "mean": 0,
  "stddev": 0
}

Plugin configuration

The main configuration for the inference plugin deals with the filtering of detector and classifiers results irrespective of the inference engine used.

Plugin configuration

{
  "<plugin instance name>": {
    "config": {
      "class_ids": [
        1
      ],
      "conf_threshold": 0.5,
      "enabled": true,
      "filter_edge_detections": false,
      "nms_eta": 1.0,
      "nms_iou_threshold": 0.5,
      "nms_merge_batches": false,
      "nms_score_threshold": 0.1,
      "nms_topk": 0.0
    }
  }
}
auto inference = new Inference();

inference->pluginConf.class_ids.push_back(1);
inference->pluginConf.conf_threshold = 0.5;
inference->pluginConf.enabled = true;
inference->pluginConf.filter_edge_detections = true;
inference->pluginConf.nms_eta = 1.0;
inference->pluginConf.nms_iou_threshold = 0.5;
inference->pluginConf.nms_merge_batches = false;
inference->pluginConf.nms_score_threshold = 0.1;
inference->pluginConf.nms_topk = 0.0;
inference = std::reinterpret_pointer_cast<InferenceManaged>(rtapi->loadPlugin("Inference", "MyPlugin", ""));

auto conf = inference->getConfig();
conf->setValue("conf_threshold", std::make_shared<cvediart::CValue>(0.5));
conf->setValue("enabled", std::make_shared<cvediart::CValue>(true));
conf->setValue("filter_edge_detections", std::make_shared<cvediart::CValue>(true));
conf->setValue("nms_eta", std::make_shared<cvediart::CValue>(1.0));
conf->setValue("nms_iou_threshold", std::make_shared<cvediart::CValue>(0.5));
conf->setValue("nms_merge_batches", std::make_shared<cvediart::CValue>(false));
conf->setValue("nms_score_threshold", std::make_shared<cvediart::CValue>(0.1));
conf->setValue("nms_topk", std::make_shared<cvediart::CValue>(0.0));
---@param self inferencemanaged
function Classifier.onInit(self)

    local conf = self:getConfig()

    conf.class_ids = [1];
    conf.conf_threshold = 0.5;
    conf.enabled = true;
    conf.filter_edge_detections = true;
    conf.nms_eta = 1.0;
    conf.nms_iou_threshold = 0.5;
    conf.nms_merge_batches = false;
    conf.nms_score_threshold = 0.1;
    conf.nms_topk = 0.0;

end

Supported engines

ONNX Runtime

Scheme: onnx://
Extensions: .onnx

ONNX is a standard model format and does not provide an engine of its own. However its runtime library supports a wide-range of backends to run on making it a very flexible format for deployment.

The currently supported backends are : CPU, CUDA, DML, TRT.

Warning

Using TensorRT as a backend will cause the system to compile the ONNX file into a native TensorRT model. This can take up to 30 minutes on edge devices. Make sure the trt_cache_path is set and trt_cache_enabled is true.

Configuration

ONNX at a minimum requires the following parameters to be present for each AI model:

{
  "channel_layout": "NCHW",
  "normalize_input": false,
  "transpose_input": false,
  "mean": 0,
  "stddev": 0
}

Depending on how the ONNX plugin is used it can be configured in different ways:

Plugin configuration

"<plugin instance name>": {
 "config": {
  "ONNX": {
   "backend": "CUDA",
   "device_id": 0,
   "execution_mode": "PARALLEL",
   "profiling": false,
   "profiling_prefix": "",
   "trt_cache_enabled": true,
   "trt_cache_path": "",
   "ort_intraop_num_threads": 8,
   "ort_interop_num_threads": 8
  }
 }
}
auto onnx = new ONNX();

onnx->pluginConf.backend = "CPU";
onnx->pluginConf.execution_mode = "PARALLEL";
onnx->pluginConf.profiling = false;
onnx->pluginConf.profiling_prefix = "";
onnx->pluginConf.device_id = 0;
onnx->pluginConf.trt_cache_path = "";
onnx->pluginConf.trt_cache_enabled = true;
onnx->pluginConf.ort_intraop_num_threads = 8;
onnx->pluginConf.ort_interop_num_threads = 8;
using namespace cvediart::api::SAPI;

Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/backend", "\"CPU\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/execution_mode", "\"PARALLEL\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/profiling", "false");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/profiling_prefix", "\"\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/device_id", "0");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/trt_cache_path", "\"\"");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/trt_cache_enabled", "true");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/ort_intraop_num_threads", "8");
Server::setStateDictEntry(instanceName, "<plugin name>/config/ONNX/ort_interop_num_threads", "8");
---@param self inferencemanaged
function Classifier.onInit(self)

    local conf = self:getConfig()

    conf.ONNX.backend = "CPU";
    conf.ONNX.execution_mode = "PARALLEL";
    conf.ONNX.profiling = false;
    conf.ONNX.profiling_prefix = "";
    conf.ONNX.device_id = 0;
    conf.ONNX.trt_cache_path = "";
    conf.ONNX.trt_cache_enabled = true;
    conf.ONNX.ort_intraop_num_threads = 8;
    conf.ONNX.ort_interop_num_threads = 8;
end

Usage

The ONNX engine can be used directly as a core interface for low-level embedding in existing C++ projects.

Usage

#include "Plugins/ONNX/include/onnx.h"

using namespace cvediart::plugin::inference;

void example() {
    cv::Mat img, img_resized;

    // Load ONNX backend
    onnx = std::make_unique<ONNX>();

    // Create inference plugin
    inference = std::make_unique<Inference>();

    // Assign backend to plugin
    inference->setBackend(onnx);

    inference->pluginConf.class_ids.push_back(1);

    inference->loadModel("models/mymodel.onnx");

    // Load image and resize to network dimensions
    img = cv::imread("assets/tests/images/2_people.jpg");
    cv::resize(img, img_resized, cv::Size(inference->inputWidth_, inference->inputHeight_));

    std::vector<cv::Mat> mats;
    mats.push_back(img_resized);

    auto detections = inference->runDetectorInference(mats).value();
    inference->detsToAbsCoords(detections, std::vector<cv::Rect>{cv::Rect(cv::Point(0, 0), img.size())}, false, true);
    detections = inference->NMS(detections);
}

TensortRT

Registered handlers:
Scheme: tensorrt://
Extensions: .plan .engine

TensorRT is the prefered engine for deployment on any device using NVIDIA acceleration. This includes the Jetson Nano , Jetson Xavier AGX and Jetson Xavier NX as well as any server using Quadro or RTX/GTX videocards.

Note

Converting an AI model to TensorRT format requires compilation on a system that has the exact same versions of CUDA and Compute capability as the target system. Conversion is done through the trt_exec executable which is part of TensorRT.

Configuration

TensorRT at a minimum requires the following parameters to be present for each AI model:

{
  "channel_layout": "",
  "mean": 0,
  "stddev": 0,
  "normalize_input": false,
  "transpose_input": false,
  "input_binding": "",
  "output_binding": [
    "Bounding Boxes",
    "Classification"
  ]
}

Depending on how the TensorRT plugin is used it can be configured in different ways:

Plugin configuration

"<plugin instance name>": {
 "config": {
  "TRT": {
    "device_id": 0
  }
 }
}
auto trt = new TRT();

trt->pluginConf.device_id = 0;
using namespace cvediart::api::SAPI;

Server::setStateDictEntry(instanceName, "<plugin name>/config/TRT/device_id_", "0");
---@param self inferencemanaged
function Classifier.onInit(self)

    local conf = self:getConfig()

    conf.TRT.device_id = 0;
end

Usage

The TensorRT engine can be used directly as a core interface for low-level embedding in existing C++ projects.

Usage

#include "Plugins/TensorRT/include/trt.h"

using namespace cvediart::plugin::inference;

void example() {
    cv::Mat img, img_resized;

    // Load TensorRT backend
    auto trt = std::make_unique<TRT>();

    // Create inference plugin
    inference = std::make_unique<Inference>();

    // Assign backend to plugin
    inference->setBackend(trt);

    inference->pluginConf.class_ids.push_back(1);

    inference->loadModel("models/mymodel.engine");

    // Load image and resize to network dimensions
    img = cv::imread("assets/tests/images/2_people.jpg");
    cv::resize(img, img_resized, cv::Size(inference->inputWidth_, inference->inputHeight_));

    std::vector<cv::Mat> mats;
    mats.push_back(img_resized);

    auto detections = inference->runDetectorInference(mats).value();
    inference->detsToAbsCoords(detections, std::vector<cv::Rect>{cv::Rect(cv::Point(0, 0), img.size())}, false, true);
    detections = inference->NMS(detections);
}

Amba CVFlow

Not yet implemented

OpenVINO

Not yet implemented