Processing Plugin Collection¶
Description¶
The Processing plugin collection provides comprehensive pre-processing and post-processing capabilities for AI inference within CVEDIA-RT. It includes specialized processors for various AI model architectures, enabling seamless integration of state-of-the-art computer vision models with optimized performance and accuracy.
Key Features¶
- Comprehensive YOLO Support: Complete support for all major YOLO variants (v4, v5, v6, v7, v10, v11, YOLOX) with optimized post-processing
- Multi-Model Architecture Support: Specialized processors for classification, OCR, face detection, pose estimation, and style transfer
- High-Performance Processing: XTensor-based operations with SIMD optimization for maximum throughput
- Flexible Configuration: Configurable confidence thresholds, NMS parameters, and class filtering
- Memory Optimization: Thread-safe tensor caching and efficient memory management
- Standardized Interface: Unified processing interface across all model types
- Debug and Profiling: Built-in Tracy profiling and comprehensive logging
Use Cases¶
- Real-Time Object Detection: High-performance YOLO-based detection pipelines
- Multi-Class Classification: Image classification with support for both single-label and multi-label scenarios
- Text Recognition: OCR processing with advanced text detection and recognition
- Pose Estimation: Keypoint detection and pose analysis for human activity recognition
- Custom AI Models: Framework for integrating custom neural network architectures
Requirements¶
Software Dependencies¶
Core Requirements: - RTCORE (CVEDIA-RT core libraries) - xtensor with SIMD support for high-performance tensor operations - Tracy profiler for performance monitoring - plog for structured logging
Optional Dependencies: - OpenCV for advanced image processing operations - CUDA/cuDNN for GPU acceleration (when available) - Intel MKL for optimized CPU operations
Hardware Requirements¶
Minimum: - x64 CPU with SSE4.2 support - 4GB RAM for basic processing - Storage for model weights and temporary data
Recommended: - Multi-core CPU with AVX2 support - 8GB+ RAM for high-throughput processing - GPU with CUDA support for acceleration - SSD storage for optimal I/O performance
Supported AI Models¶
YOLO Family Support¶
YOLO Version | Plugin | Architecture Type | Key Features |
---|---|---|---|
YOLOv4 | PostYoloV4 | Anchor-based | Classic YOLO with CSPDarknet53 backbone |
YOLOv5 | PostYoloV5 | Anchor-based | Improved anchor system with auto-anchor support |
YOLOv6 | PostYoloV6 | Anchor-free | Enhanced multi-scale detection |
YOLOv7 | PostYoloV7 | Anchor-based | Advanced training techniques, improved accuracy |
YOLOv10 | PostYoloV10 | NMS-free | Dual assignment, end-to-end detection |
YOLOv11 | PostYoloV11 | Next-generation | Latest architectural innovations |
YOLOX | PostYoloX | Anchor-free | Decoupled head design, advanced augmentation |
Specialized Model Support¶
Model Type | Plugin | Supported Architectures | Applications |
---|---|---|---|
Classification | PostClassifier | ResNet, EfficientNet, MobileNet, ViT | Multi-class/multi-label classification |
OCR | post_ocr | CRNN, EAST, DBNet, PaddleOCR | Text detection and recognition |
Face Detection | PostUltraface | Ultraface, MTCNN, RetinaFace | Lightweight face detection |
Point Detection | PostPoint | OpenPose, PoseNet, AlphaPose | Keypoint and pose estimation |
GFL Detection | PostGFL | GFL-based architectures | Distribution-based object regression |
Style Transfer | PostNeuralstyle | Neural style transfer models | Real-time artistic effects |
Custom Processing | PostPassthru | Any architecture | Framework for custom processing |
Configuration¶
Basic YOLO Configuration¶
{
"processing": {
"yolo": {
"confThreshold": 0.2,
"nmsIouThreshold": 0.5,
"nmsScoreThreshold": 0.1,
"filterEdgeDetections": false,
"class_ids": [0, 1, 2, 15, 16]
}
}
}
Advanced Multi-Model Configuration¶
{
"processing": {
"detection": {
"model_type": "yolov8",
"confThreshold": 0.25,
"nmsIouThreshold": 0.45,
"nmsScoreThreshold": 0.15,
"filterEdgeDetections": true,
"nmsMergeBatches": false,
"class_filtering": {
"enabled": true,
"class_ids": [0, 1, 2, 3, 5, 7],
"label_remapping": {
"0": "person",
"1": "bicycle",
"2": "car"
}
}
},
"classification": {
"model_type": "resnet50",
"threshold": 0.5,
"top_k": 5,
"multi_label": false
},
"ocr": {
"text_detection": {
"threshold": 0.7,
"link_threshold": 0.4
},
"text_recognition": {
"beam_width": 5,
"dictionary_enabled": true
}
},
"performance": {
"enable_profiling": true,
"cache_anchors": true,
"simd_optimization": true,
"batch_processing": true
}
}
}
Configuration Schema¶
Parameter | Type | Default | Description |
---|---|---|---|
confThreshold |
float | 0.2 | Minimum confidence threshold for detections |
nmsIouThreshold |
float | 0.5 | IoU threshold for Non-Maximum Suppression |
nmsScoreThreshold |
float | 0.1 | Score threshold for NMS filtering |
filterEdgeDetections |
boolean | false | Filter detections near image edges |
nmsMergeBatches |
boolean | false | Merge batches during NMS processing |
class_ids |
array | [] | Filter specific class IDs (empty = all classes) |
label_remapping |
object | {} | Custom label name mapping |
regMax |
integer | 16 | DFL register maximum for supported models |
strides |
array | [8,16,32] | Multi-scale detection strides |
enable_profiling |
boolean | false | Enable Tracy profiling |
cache_anchors |
boolean | true | Cache anchor grids for performance |
simd_optimization |
boolean | true | Enable SIMD acceleration |
batch_processing |
boolean | true | Enable batch processing optimization |
API Reference¶
C++ Processing Interface¶
All post-processing plugins implement a standardized interface:
namespace cvedia::rt {
// Standard post-processing interface
template<typename T>
expected<cvec> postProcess(InferenceContext& ctx,
std::vector<xt::xarray<float>>& output,
CValue* inferenceConf);
// YOLO-specific processing
class YoloProcessor {
public:
struct Config {
float confThreshold = 0.2f;
float nmsIouThreshold = 0.5f;
float nmsScoreThreshold = 0.1f;
bool filterEdgeDetections = false;
bool nmsMergeBatches = false;
std::vector<int> class_ids;
std::map<int, std::string> label_remapping;
};
static expected<cvec> process(const Config& config,
const std::vector<xt::xarray<float>>& outputs,
const ImageMetadata& metadata);
};
// Classification processing
class ClassificationProcessor {
public:
struct Result {
int class_id;
float confidence;
std::string label;
};
static expected<std::vector<Result>> classify(
const xt::xarray<float>& output,
float threshold = 0.5f,
int top_k = 5,
bool multi_label = false);
};
}
Tensor Specifications¶
YOLO Models¶
// Input tensor format
struct YoloInput {
xt::xarray<float> image; // Shape: [B, C, H, W] or [B, H, W, C]
ImageMetadata metadata; // Original image dimensions and preprocessing info
};
// Output tensor formats
struct YoloOutput {
// Grid-based format (YOLOv4, YOLOv5, YOLOv7)
xt::xarray<float> grid_output; // Shape: [B, A, H, W, C] where C = classes + 5
// Flattened format (YOLOv6, YOLOv8)
xt::xarray<float> flat_output; // Shape: [B, N, C] where N = H*W*anchors
// NMS-free format (YOLOv10)
xt::xarray<float> direct_output; // Shape: [B, detections, 7] (x1,y1,x2,y2,conf,cls,batch_id)
};
Classification Models¶
struct ClassificationOutput {
xt::xarray<float> logits; // Shape: [B, num_classes]
xt::xarray<float> probabilities; // Softmax applied
std::vector<std::string> labels; // Class labels
};
Specialized Processing APIs¶
// OCR processing
namespace ocr {
struct TextDetection {
std::vector<cv::Point2f> bbox;
float confidence;
std::string text;
};
expected<std::vector<TextDetection>> processOCR(
const xt::xarray<float>& detection_output,
const xt::xarray<float>& recognition_output,
float det_threshold = 0.7f,
float rec_threshold = 0.5f);
}
// Point detection
namespace pose {
struct Keypoint {
cv::Point2f point;
float confidence;
int joint_id;
};
struct PoseResult {
std::vector<Keypoint> keypoints;
float overall_confidence;
cv::Rect2f bbox;
};
expected<std::vector<PoseResult>> processPose(
const xt::xarray<float>& heatmaps,
const xt::xarray<float>& pafs,
float keypoint_threshold = 0.1f);
}
Examples¶
Basic YOLO Object Detection¶
#include "processing/yolo/postprocessor.h"
#include "inference/inference_context.h"
// Configure YOLO post-processing
YoloProcessor::Config config;
config.confThreshold = 0.25f;
config.nmsIouThreshold = 0.45f;
config.nmsScoreThreshold = 0.15f;
config.class_ids = {0, 1, 2, 3, 5, 7}; // person, bicycle, car, motorcycle, bus, truck
// Process inference results
InferenceContext ctx;
std::vector<xt::xarray<float>> model_outputs = runInference(input_image);
// Apply post-processing
auto detections = YoloProcessor::process(config, model_outputs, image_metadata);
if (detections) {
for (const auto& detection : detections.value()) {
std::cout << "Class: " << detection.class_id
<< ", Confidence: " << detection.confidence
<< ", BBox: [" << detection.bbox.x << ", "
<< detection.bbox.y << ", "
<< detection.bbox.width << ", "
<< detection.bbox.height << "]" << std::endl;
}
} else {
std::cerr << "Detection failed: " << detections.error() << std::endl;
}
Multi-Label Classification¶
#include "processing/classification/classifier.h"
// Configure classification
float threshold = 0.3f; // Lower threshold for multi-label
int top_k = 10;
bool multi_label = true;
// Process classification output
xt::xarray<float> logits = runClassificationModel(input_image);
auto results = ClassificationProcessor::classify(logits, threshold, top_k, multi_label);
if (results) {
std::cout << "Multi-label classification results:" << std::endl;
for (const auto& result : results.value()) {
std::cout << "Label: " << result.label
<< ", Confidence: " << result.confidence << std::endl;
}
}
Custom Processing Pipeline¶
#include "processing/common/post_common.h"
class CustomProcessor {
public:
static expected<cvec> processCustomModel(
InferenceContext& ctx,
std::vector<xt::xarray<float>>& outputs,
CValue* config) {
// Extract configuration
float custom_threshold = config->getValue("custom_threshold", 0.5f);
bool enable_filtering = config->getValue("enable_filtering", true);
// Process first output tensor
const auto& primary_output = outputs[0];
auto shape = primary_output.shape();
cvec detections;
// Custom processing logic
for (size_t i = 0; i < shape[0]; ++i) {
for (size_t j = 0; j < shape[1]; ++j) {
float confidence = primary_output(i, j, 4); // Confidence channel
if (confidence > custom_threshold) {
// Extract detection data
Detection detection;
detection.confidence = confidence;
detection.class_id = static_cast<int>(primary_output(i, j, 5));
// Convert normalized coordinates to absolute
detection.bbox.x = primary_output(i, j, 0) * ctx.imageWidth;
detection.bbox.y = primary_output(i, j, 1) * ctx.imageHeight;
detection.bbox.width = primary_output(i, j, 2) * ctx.imageWidth;
detection.bbox.height = primary_output(i, j, 3) * ctx.imageHeight;
detections.push_back(std::make_shared<CValue>(detection.toCValue()));
}
}
}
// Apply custom filtering if enabled
if (enable_filtering) {
detections = applyCustomFiltering(detections);
}
return detections;
}
private:
static cvec applyCustomFiltering(const cvec& input) {
// Custom filtering logic
cvec filtered;
for (const auto& detection : input) {
// Apply custom filtering criteria
if (meetsCustomCriteria(detection)) {
filtered.push_back(detection);
}
}
return filtered;
}
};
Performance Monitoring¶
#include "tracy/Tracy.hpp"
#include "processing/performance/monitor.h"
class PerformanceOptimizedProcessor {
public:
static expected<cvec> processWithProfiling(
InferenceContext& ctx,
std::vector<xt::xarray<float>>& outputs,
CValue* config) {
ZoneScoped; // Tracy profiling zone
{
ZoneScopedN("Preprocessing");
// Pre-processing operations
}
{
ZoneScopedN("Core Processing");
// Main processing logic
}
{
ZoneScopedN("Post-filtering");
// NMS and filtering operations
}
return results;
}
};
// Performance monitoring
class ProcessingMonitor {
public:
void recordProcessingTime(const std::string& processor,
std::chrono::milliseconds duration) {
processing_times_[processor].push_back(duration);
// Keep only last 100 measurements
if (processing_times_[processor].size() > 100) {
processing_times_[processor].erase(processing_times_[processor].begin());
}
}
double getAverageProcessingTime(const std::string& processor) const {
const auto& times = processing_times_.at(processor);
if (times.empty()) return 0.0;
auto total = std::accumulate(times.begin(), times.end(), std::chrono::milliseconds{0});
return total.count() / static_cast<double>(times.size());
}
double getProcessingFps(const std::string& processor) const {
double avg_time = getAverageProcessingTime(processor);
return avg_time > 0.0 ? 1000.0 / avg_time : 0.0;
}
private:
std::unordered_map<std::string, std::vector<std::chrono::milliseconds>> processing_times_;
};
Performance Optimization¶
Memory Optimization¶
// Thread-safe anchor caching
class AnchorCache {
public:
static xt::xarray<float> getAnchors(const std::vector<int>& strides,
int input_height, int input_width) {
std::lock_guard<std::mutex> lock(cache_mutex_);
std::string key = createCacheKey(strides, input_height, input_width);
auto it = anchor_cache_.find(key);
if (it != anchor_cache_.end()) {
return it->second;
}
// Generate and cache anchors
auto anchors = generateAnchors(strides, input_height, input_width);
anchor_cache_[key] = anchors;
return anchors;
}
private:
static std::mutex cache_mutex_;
static std::unordered_map<std::string, xt::xarray<float>> anchor_cache_;
};
SIMD Optimization¶
// XTensor SIMD-optimized operations
namespace simd_ops {
// Vectorized confidence thresholding
xt::xarray<bool> threshold_confidence(const xt::xarray<float>& confidences,
float threshold) {
return xt::greater(confidences, threshold);
}
// Vectorized IoU calculation
xt::xarray<float> compute_iou_matrix(const xt::xarray<float>& boxes1,
const xt::xarray<float>& boxes2) {
// SIMD-optimized IoU computation using xtensor
auto areas1 = (boxes1(xt::all(), 2) - boxes1(xt::all(), 0)) *
(boxes1(xt::all(), 3) - boxes1(xt::all(), 1));
auto areas2 = (boxes2(xt::all(), 2) - boxes2(xt::all(), 0)) *
(boxes2(xt::all(), 3) - boxes2(xt::all(), 1));
// Intersection computation
auto inter_x1 = xt::maximum(xt::expand_dims(boxes1(xt::all(), 0), 1),
xt::expand_dims(boxes2(xt::all(), 0), 0));
auto inter_y1 = xt::maximum(xt::expand_dims(boxes1(xt::all(), 1), 1),
xt::expand_dims(boxes2(xt::all(), 1), 0));
auto inter_x2 = xt::minimum(xt::expand_dims(boxes1(xt::all(), 2), 1),
xt::expand_dims(boxes2(xt::all(), 2), 0));
auto inter_y2 = xt::minimum(xt::expand_dims(boxes1(xt::all(), 3), 1),
xt::expand_dims(boxes2(xt::all(), 3), 0));
auto inter_area = xt::maximum(0.0f, inter_x2 - inter_x1) *
xt::maximum(0.0f, inter_y2 - inter_y1);
auto union_area = xt::expand_dims(areas1, 1) + xt::expand_dims(areas2, 0) - inter_area;
return inter_area / xt::maximum(union_area, 1e-6f);
}
}
Batch Processing Optimization¶
class BatchProcessor {
public:
static expected<cvec> processBatch(const std::vector<InferenceContext>& contexts,
const std::vector<std::vector<xt::xarray<float>>>& batch_outputs,
const ProcessingConfig& config) {
cvec all_detections;
// Process batches in parallel when possible
if (config.parallel_processing && batch_outputs.size() > 1) {
std::vector<std::future<expected<cvec>>> futures;
for (size_t i = 0; i < batch_outputs.size(); ++i) {
futures.push_back(std::async(std::launch::async, [&, i]() {
return processSingle(contexts[i], batch_outputs[i], config);
}));
}
for (auto& future : futures) {
auto result = future.get();
if (result) {
all_detections.insert(all_detections.end(),
result.value().begin(),
result.value().end());
}
}
} else {
// Sequential processing
for (size_t i = 0; i < batch_outputs.size(); ++i) {
auto result = processSingle(contexts[i], batch_outputs[i], config);
if (result) {
all_detections.insert(all_detections.end(),
result.value().begin(),
result.value().end());
}
}
}
return all_detections;
}
};
Troubleshooting¶
Common Issues¶
Low Detection Accuracy
// Adjust confidence thresholds
config.confThreshold = 0.1f; // Lower threshold for more detections
config.nmsScoreThreshold = 0.05f; // Lower NMS threshold
// Disable edge filtering
config.filterEdgeDetections = false;
// Check class filtering
config.class_ids.clear(); // Allow all classes
High Memory Usage
// Enable anchor caching
config.cache_anchors = true;
// Optimize tensor operations
config.simd_optimization = true;
// Reduce batch size
config.max_batch_size = 1;
Performance Issues
// Enable profiling to identify bottlenecks
config.enable_profiling = true;
// Optimize NMS parameters
config.nmsIouThreshold = 0.7f; // Higher threshold = fewer comparisons
config.nmsScoreThreshold = 0.2f; // Higher threshold = fewer candidates
Error Recovery¶
class RobustProcessor {
public:
static expected<cvec> processWithFallback(
InferenceContext& ctx,
std::vector<xt::xarray<float>>& outputs,
CValue* config) {
try {
// Primary processing
return primaryProcessor(ctx, outputs, config);
} catch (const std::exception& e) {
PLOG_WARNING << "Primary processor failed: " << e.what();
try {
// Fallback processing
return fallbackProcessor(ctx, outputs, config);
} catch (const std::exception& e2) {
PLOG_ERROR << "Fallback processor failed: " << e2.what();
return tl::unexpected(ProcessingError::TOTAL_FAILURE);
}
}
}
private:
static expected<cvec> fallbackProcessor(
InferenceContext& ctx,
std::vector<xt::xarray<float>>& outputs,
CValue* config) {
// Simplified processing with relaxed parameters
ProcessingConfig fallback_config;
fallback_config.confThreshold = 0.1f;
fallback_config.nmsIouThreshold = 0.8f;
fallback_config.enable_simd = false;
return basicProcessor(ctx, outputs, &fallback_config);
}
};
Debug Tools¶
#include "debug/visualization.h"
#include "debug/tensor_inspector.h"
class ProcessingDebugger {
public:
static void debugDetections(const cvec& detections,
const cv::Mat& image,
const std::string& save_path) {
cv::Mat debug_image = image.clone();
for (const auto& det : detections) {
auto detection = extractDetection(det);
// Draw bounding box
cv::rectangle(debug_image, detection.bbox, cv::Scalar(0, 255, 0), 2);
// Draw confidence and class
std::string label = std::to_string(detection.class_id) +
": " + std::to_string(detection.confidence);
cv::putText(debug_image, label,
cv::Point(detection.bbox.x, detection.bbox.y - 10),
cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 255, 0), 1);
}
cv::imwrite(save_path, debug_image);
}
static void inspectTensor(const xt::xarray<float>& tensor,
const std::string& name) {
PLOG_INFO << "Tensor " << name << " inspection:";
PLOG_INFO << " Shape: " << xt::adapt(tensor.shape());
PLOG_INFO << " Min: " << xt::amin(tensor)[0];
PLOG_INFO << " Max: " << xt::amax(tensor)[0];
PLOG_INFO << " Mean: " << xt::mean(tensor)[0];
PLOG_INFO << " Std: " << xt::stddev(tensor)[0];
// Check for NaN or infinity values
bool has_nan = xt::any(xt::isnan(tensor));
bool has_inf = xt::any(xt::isinf(tensor));
if (has_nan) PLOG_WARNING << " Contains NaN values!";
if (has_inf) PLOG_WARNING << " Contains infinite values!";
}
};
Best Practices¶
Configuration Management¶
- Threshold Tuning: Start with default values and adjust based on precision/recall requirements
- Class Filtering: Use class filtering to improve performance when only specific objects are needed
- Memory Management: Enable caching and SIMD optimization for production deployments
- Profiling: Use Tracy profiling to identify performance bottlenecks
Integration Guidelines¶
- Error Handling: Implement comprehensive error handling with fallback mechanisms
- Performance Monitoring: Track processing times and memory usage in production
- Model Compatibility: Verify tensor shapes and formats match expected model outputs
- Testing: Validate processing accuracy across different model architectures
Performance Optimization¶
- Batch Processing: Use batch processing when handling multiple images simultaneously
- Parallel Processing: Leverage multi-threading for independent processing tasks
- Memory Pooling: Implement tensor memory pooling for high-throughput scenarios
- Hardware Acceleration: Utilize SIMD instructions and GPU acceleration when available