Skip to content

Processing Plugin Collection

Description

The Processing plugin collection provides comprehensive pre-processing and post-processing capabilities for AI inference within CVEDIA-RT. It includes specialized processors for various AI model architectures, enabling seamless integration of state-of-the-art computer vision models with optimized performance and accuracy.

Key Features

  • Comprehensive YOLO Support: Complete support for all major YOLO variants (v4, v5, v6, v7, v10, v11, YOLOX) with optimized post-processing
  • Multi-Model Architecture Support: Specialized processors for classification, OCR, face detection, pose estimation, and style transfer
  • High-Performance Processing: XTensor-based operations with SIMD optimization for maximum throughput
  • Flexible Configuration: Configurable confidence thresholds, NMS parameters, and class filtering
  • Memory Optimization: Thread-safe tensor caching and efficient memory management
  • Standardized Interface: Unified processing interface across all model types
  • Debug and Profiling: Built-in Tracy profiling and comprehensive logging

Use Cases

  • Real-Time Object Detection: High-performance YOLO-based detection pipelines
  • Multi-Class Classification: Image classification with support for both single-label and multi-label scenarios
  • Text Recognition: OCR processing with advanced text detection and recognition
  • Pose Estimation: Keypoint detection and pose analysis for human activity recognition
  • Custom AI Models: Framework for integrating custom neural network architectures

Requirements

Software Dependencies

Core Requirements: - RTCORE (CVEDIA-RT core libraries) - xtensor with SIMD support for high-performance tensor operations - Tracy profiler for performance monitoring - plog for structured logging

Optional Dependencies: - OpenCV for advanced image processing operations - CUDA/cuDNN for GPU acceleration (when available) - Intel MKL for optimized CPU operations

Hardware Requirements

Minimum: - x64 CPU with SSE4.2 support - 4GB RAM for basic processing - Storage for model weights and temporary data

Recommended: - Multi-core CPU with AVX2 support - 8GB+ RAM for high-throughput processing - GPU with CUDA support for acceleration - SSD storage for optimal I/O performance

Supported AI Models

YOLO Family Support

YOLO Version Plugin Architecture Type Key Features
YOLOv4 PostYoloV4 Anchor-based Classic YOLO with CSPDarknet53 backbone
YOLOv5 PostYoloV5 Anchor-based Improved anchor system with auto-anchor support
YOLOv6 PostYoloV6 Anchor-free Enhanced multi-scale detection
YOLOv7 PostYoloV7 Anchor-based Advanced training techniques, improved accuracy
YOLOv10 PostYoloV10 NMS-free Dual assignment, end-to-end detection
YOLOv11 PostYoloV11 Next-generation Latest architectural innovations
YOLOX PostYoloX Anchor-free Decoupled head design, advanced augmentation

Specialized Model Support

Model Type Plugin Supported Architectures Applications
Classification PostClassifier ResNet, EfficientNet, MobileNet, ViT Multi-class/multi-label classification
OCR post_ocr CRNN, EAST, DBNet, PaddleOCR Text detection and recognition
Face Detection PostUltraface Ultraface, MTCNN, RetinaFace Lightweight face detection
Point Detection PostPoint OpenPose, PoseNet, AlphaPose Keypoint and pose estimation
GFL Detection PostGFL GFL-based architectures Distribution-based object regression
Style Transfer PostNeuralstyle Neural style transfer models Real-time artistic effects
Custom Processing PostPassthru Any architecture Framework for custom processing

Configuration

Basic YOLO Configuration

{
  "processing": {
    "yolo": {
      "confThreshold": 0.2,
      "nmsIouThreshold": 0.5,
      "nmsScoreThreshold": 0.1,
      "filterEdgeDetections": false,
      "class_ids": [0, 1, 2, 15, 16]
    }
  }
}

Advanced Multi-Model Configuration

{
  "processing": {
    "detection": {
      "model_type": "yolov8",
      "confThreshold": 0.25,
      "nmsIouThreshold": 0.45,
      "nmsScoreThreshold": 0.15,
      "filterEdgeDetections": true,
      "nmsMergeBatches": false,
      "class_filtering": {
        "enabled": true,
        "class_ids": [0, 1, 2, 3, 5, 7],
        "label_remapping": {
          "0": "person",
          "1": "bicycle",
          "2": "car"
        }
      }
    },
    "classification": {
      "model_type": "resnet50",
      "threshold": 0.5,
      "top_k": 5,
      "multi_label": false
    },
    "ocr": {
      "text_detection": {
        "threshold": 0.7,
        "link_threshold": 0.4
      },
      "text_recognition": {
        "beam_width": 5,
        "dictionary_enabled": true
      }
    },
    "performance": {
      "enable_profiling": true,
      "cache_anchors": true,
      "simd_optimization": true,
      "batch_processing": true
    }
  }
}

Configuration Schema

Parameter Type Default Description
confThreshold float 0.2 Minimum confidence threshold for detections
nmsIouThreshold float 0.5 IoU threshold for Non-Maximum Suppression
nmsScoreThreshold float 0.1 Score threshold for NMS filtering
filterEdgeDetections boolean false Filter detections near image edges
nmsMergeBatches boolean false Merge batches during NMS processing
class_ids array [] Filter specific class IDs (empty = all classes)
label_remapping object {} Custom label name mapping
regMax integer 16 DFL register maximum for supported models
strides array [8,16,32] Multi-scale detection strides
enable_profiling boolean false Enable Tracy profiling
cache_anchors boolean true Cache anchor grids for performance
simd_optimization boolean true Enable SIMD acceleration
batch_processing boolean true Enable batch processing optimization

API Reference

C++ Processing Interface

All post-processing plugins implement a standardized interface:

namespace cvedia::rt {
    // Standard post-processing interface
    template<typename T>
    expected<cvec> postProcess(InferenceContext& ctx, 
                              std::vector<xt::xarray<float>>& output, 
                              CValue* inferenceConf);

    // YOLO-specific processing
    class YoloProcessor {
    public:
        struct Config {
            float confThreshold = 0.2f;
            float nmsIouThreshold = 0.5f;
            float nmsScoreThreshold = 0.1f;
            bool filterEdgeDetections = false;
            bool nmsMergeBatches = false;
            std::vector<int> class_ids;
            std::map<int, std::string> label_remapping;
        };

        static expected<cvec> process(const Config& config,
                                    const std::vector<xt::xarray<float>>& outputs,
                                    const ImageMetadata& metadata);
    };

    // Classification processing
    class ClassificationProcessor {
    public:
        struct Result {
            int class_id;
            float confidence;
            std::string label;
        };

        static expected<std::vector<Result>> classify(
            const xt::xarray<float>& output,
            float threshold = 0.5f,
            int top_k = 5,
            bool multi_label = false);
    };
}

Tensor Specifications

YOLO Models

// Input tensor format
struct YoloInput {
    xt::xarray<float> image;  // Shape: [B, C, H, W] or [B, H, W, C]
    ImageMetadata metadata;   // Original image dimensions and preprocessing info
};

// Output tensor formats
struct YoloOutput {
    // Grid-based format (YOLOv4, YOLOv5, YOLOv7)
    xt::xarray<float> grid_output;  // Shape: [B, A, H, W, C] where C = classes + 5

    // Flattened format (YOLOv6, YOLOv8)
    xt::xarray<float> flat_output;  // Shape: [B, N, C] where N = H*W*anchors

    // NMS-free format (YOLOv10)
    xt::xarray<float> direct_output; // Shape: [B, detections, 7] (x1,y1,x2,y2,conf,cls,batch_id)
};

Classification Models

struct ClassificationOutput {
    xt::xarray<float> logits;     // Shape: [B, num_classes]
    xt::xarray<float> probabilities; // Softmax applied
    std::vector<std::string> labels;  // Class labels
};

Specialized Processing APIs

// OCR processing
namespace ocr {
    struct TextDetection {
        std::vector<cv::Point2f> bbox;
        float confidence;
        std::string text;
    };

    expected<std::vector<TextDetection>> processOCR(
        const xt::xarray<float>& detection_output,
        const xt::xarray<float>& recognition_output,
        float det_threshold = 0.7f,
        float rec_threshold = 0.5f);
}

// Point detection
namespace pose {
    struct Keypoint {
        cv::Point2f point;
        float confidence;
        int joint_id;
    };

    struct PoseResult {
        std::vector<Keypoint> keypoints;
        float overall_confidence;
        cv::Rect2f bbox;
    };

    expected<std::vector<PoseResult>> processPose(
        const xt::xarray<float>& heatmaps,
        const xt::xarray<float>& pafs,
        float keypoint_threshold = 0.1f);
}

Examples

Basic YOLO Object Detection

#include "processing/yolo/postprocessor.h"
#include "inference/inference_context.h"

// Configure YOLO post-processing
YoloProcessor::Config config;
config.confThreshold = 0.25f;
config.nmsIouThreshold = 0.45f;
config.nmsScoreThreshold = 0.15f;
config.class_ids = {0, 1, 2, 3, 5, 7}; // person, bicycle, car, motorcycle, bus, truck

// Process inference results
InferenceContext ctx;
std::vector<xt::xarray<float>> model_outputs = runInference(input_image);

// Apply post-processing
auto detections = YoloProcessor::process(config, model_outputs, image_metadata);

if (detections) {
    for (const auto& detection : detections.value()) {
        std::cout << "Class: " << detection.class_id 
                  << ", Confidence: " << detection.confidence
                  << ", BBox: [" << detection.bbox.x << ", " 
                  << detection.bbox.y << ", " 
                  << detection.bbox.width << ", " 
                  << detection.bbox.height << "]" << std::endl;
    }
} else {
    std::cerr << "Detection failed: " << detections.error() << std::endl;
}

Multi-Label Classification

#include "processing/classification/classifier.h"

// Configure classification
float threshold = 0.3f;  // Lower threshold for multi-label
int top_k = 10;
bool multi_label = true;

// Process classification output
xt::xarray<float> logits = runClassificationModel(input_image);
auto results = ClassificationProcessor::classify(logits, threshold, top_k, multi_label);

if (results) {
    std::cout << "Multi-label classification results:" << std::endl;
    for (const auto& result : results.value()) {
        std::cout << "Label: " << result.label 
                  << ", Confidence: " << result.confidence << std::endl;
    }
}

Custom Processing Pipeline

#include "processing/common/post_common.h"

class CustomProcessor {
public:
    static expected<cvec> processCustomModel(
        InferenceContext& ctx,
        std::vector<xt::xarray<float>>& outputs,
        CValue* config) {

        // Extract configuration
        float custom_threshold = config->getValue("custom_threshold", 0.5f);
        bool enable_filtering = config->getValue("enable_filtering", true);

        // Process first output tensor
        const auto& primary_output = outputs[0];
        auto shape = primary_output.shape();

        cvec detections;

        // Custom processing logic
        for (size_t i = 0; i < shape[0]; ++i) {
            for (size_t j = 0; j < shape[1]; ++j) {
                float confidence = primary_output(i, j, 4); // Confidence channel

                if (confidence > custom_threshold) {
                    // Extract detection data
                    Detection detection;
                    detection.confidence = confidence;
                    detection.class_id = static_cast<int>(primary_output(i, j, 5));

                    // Convert normalized coordinates to absolute
                    detection.bbox.x = primary_output(i, j, 0) * ctx.imageWidth;
                    detection.bbox.y = primary_output(i, j, 1) * ctx.imageHeight;
                    detection.bbox.width = primary_output(i, j, 2) * ctx.imageWidth;
                    detection.bbox.height = primary_output(i, j, 3) * ctx.imageHeight;

                    detections.push_back(std::make_shared<CValue>(detection.toCValue()));
                }
            }
        }

        // Apply custom filtering if enabled
        if (enable_filtering) {
            detections = applyCustomFiltering(detections);
        }

        return detections;
    }

private:
    static cvec applyCustomFiltering(const cvec& input) {
        // Custom filtering logic
        cvec filtered;
        for (const auto& detection : input) {
            // Apply custom filtering criteria
            if (meetsCustomCriteria(detection)) {
                filtered.push_back(detection);
            }
        }
        return filtered;
    }
};

Performance Monitoring

#include "tracy/Tracy.hpp"
#include "processing/performance/monitor.h"

class PerformanceOptimizedProcessor {
public:
    static expected<cvec> processWithProfiling(
        InferenceContext& ctx,
        std::vector<xt::xarray<float>>& outputs,
        CValue* config) {

        ZoneScoped;  // Tracy profiling zone

        {
            ZoneScopedN("Preprocessing");
            // Pre-processing operations
        }

        {
            ZoneScopedN("Core Processing");
            // Main processing logic
        }

        {
            ZoneScopedN("Post-filtering");
            // NMS and filtering operations
        }

        return results;
    }
};

// Performance monitoring
class ProcessingMonitor {
public:
    void recordProcessingTime(const std::string& processor, 
                             std::chrono::milliseconds duration) {
        processing_times_[processor].push_back(duration);

        // Keep only last 100 measurements
        if (processing_times_[processor].size() > 100) {
            processing_times_[processor].erase(processing_times_[processor].begin());
        }
    }

    double getAverageProcessingTime(const std::string& processor) const {
        const auto& times = processing_times_.at(processor);
        if (times.empty()) return 0.0;

        auto total = std::accumulate(times.begin(), times.end(), std::chrono::milliseconds{0});
        return total.count() / static_cast<double>(times.size());
    }

    double getProcessingFps(const std::string& processor) const {
        double avg_time = getAverageProcessingTime(processor);
        return avg_time > 0.0 ? 1000.0 / avg_time : 0.0;
    }

private:
    std::unordered_map<std::string, std::vector<std::chrono::milliseconds>> processing_times_;
};

Performance Optimization

Memory Optimization

// Thread-safe anchor caching
class AnchorCache {
public:
    static xt::xarray<float> getAnchors(const std::vector<int>& strides, 
                                       int input_height, int input_width) {
        std::lock_guard<std::mutex> lock(cache_mutex_);

        std::string key = createCacheKey(strides, input_height, input_width);
        auto it = anchor_cache_.find(key);

        if (it != anchor_cache_.end()) {
            return it->second;
        }

        // Generate and cache anchors
        auto anchors = generateAnchors(strides, input_height, input_width);
        anchor_cache_[key] = anchors;

        return anchors;
    }

private:
    static std::mutex cache_mutex_;
    static std::unordered_map<std::string, xt::xarray<float>> anchor_cache_;
};

SIMD Optimization

// XTensor SIMD-optimized operations
namespace simd_ops {
    // Vectorized confidence thresholding
    xt::xarray<bool> threshold_confidence(const xt::xarray<float>& confidences, 
                                         float threshold) {
        return xt::greater(confidences, threshold);
    }

    // Vectorized IoU calculation
    xt::xarray<float> compute_iou_matrix(const xt::xarray<float>& boxes1,
                                        const xt::xarray<float>& boxes2) {
        // SIMD-optimized IoU computation using xtensor
        auto areas1 = (boxes1(xt::all(), 2) - boxes1(xt::all(), 0)) * 
                     (boxes1(xt::all(), 3) - boxes1(xt::all(), 1));
        auto areas2 = (boxes2(xt::all(), 2) - boxes2(xt::all(), 0)) * 
                     (boxes2(xt::all(), 3) - boxes2(xt::all(), 1));

        // Intersection computation
        auto inter_x1 = xt::maximum(xt::expand_dims(boxes1(xt::all(), 0), 1), 
                                   xt::expand_dims(boxes2(xt::all(), 0), 0));
        auto inter_y1 = xt::maximum(xt::expand_dims(boxes1(xt::all(), 1), 1), 
                                   xt::expand_dims(boxes2(xt::all(), 1), 0));
        auto inter_x2 = xt::minimum(xt::expand_dims(boxes1(xt::all(), 2), 1), 
                                   xt::expand_dims(boxes2(xt::all(), 2), 0));
        auto inter_y2 = xt::minimum(xt::expand_dims(boxes1(xt::all(), 3), 1), 
                                   xt::expand_dims(boxes2(xt::all(), 3), 0));

        auto inter_area = xt::maximum(0.0f, inter_x2 - inter_x1) * 
                         xt::maximum(0.0f, inter_y2 - inter_y1);

        auto union_area = xt::expand_dims(areas1, 1) + xt::expand_dims(areas2, 0) - inter_area;

        return inter_area / xt::maximum(union_area, 1e-6f);
    }
}

Batch Processing Optimization

class BatchProcessor {
public:
    static expected<cvec> processBatch(const std::vector<InferenceContext>& contexts,
                                      const std::vector<std::vector<xt::xarray<float>>>& batch_outputs,
                                      const ProcessingConfig& config) {

        cvec all_detections;

        // Process batches in parallel when possible
        if (config.parallel_processing && batch_outputs.size() > 1) {
            std::vector<std::future<expected<cvec>>> futures;

            for (size_t i = 0; i < batch_outputs.size(); ++i) {
                futures.push_back(std::async(std::launch::async, [&, i]() {
                    return processSingle(contexts[i], batch_outputs[i], config);
                }));
            }

            for (auto& future : futures) {
                auto result = future.get();
                if (result) {
                    all_detections.insert(all_detections.end(), 
                                         result.value().begin(), 
                                         result.value().end());
                }
            }
        } else {
            // Sequential processing
            for (size_t i = 0; i < batch_outputs.size(); ++i) {
                auto result = processSingle(contexts[i], batch_outputs[i], config);
                if (result) {
                    all_detections.insert(all_detections.end(), 
                                         result.value().begin(), 
                                         result.value().end());
                }
            }
        }

        return all_detections;
    }
};

Troubleshooting

Common Issues

Low Detection Accuracy

// Adjust confidence thresholds
config.confThreshold = 0.1f;  // Lower threshold for more detections
config.nmsScoreThreshold = 0.05f;  // Lower NMS threshold

// Disable edge filtering
config.filterEdgeDetections = false;

// Check class filtering
config.class_ids.clear();  // Allow all classes

High Memory Usage

// Enable anchor caching
config.cache_anchors = true;

// Optimize tensor operations
config.simd_optimization = true;

// Reduce batch size
config.max_batch_size = 1;

Performance Issues

// Enable profiling to identify bottlenecks
config.enable_profiling = true;

// Optimize NMS parameters
config.nmsIouThreshold = 0.7f;  // Higher threshold = fewer comparisons
config.nmsScoreThreshold = 0.2f;  // Higher threshold = fewer candidates

Error Recovery

class RobustProcessor {
public:
    static expected<cvec> processWithFallback(
        InferenceContext& ctx,
        std::vector<xt::xarray<float>>& outputs,
        CValue* config) {

        try {
            // Primary processing
            return primaryProcessor(ctx, outputs, config);
        } catch (const std::exception& e) {
            PLOG_WARNING << "Primary processor failed: " << e.what();

            try {
                // Fallback processing
                return fallbackProcessor(ctx, outputs, config);
            } catch (const std::exception& e2) {
                PLOG_ERROR << "Fallback processor failed: " << e2.what();
                return tl::unexpected(ProcessingError::TOTAL_FAILURE);
            }
        }
    }

private:
    static expected<cvec> fallbackProcessor(
        InferenceContext& ctx,
        std::vector<xt::xarray<float>>& outputs,
        CValue* config) {

        // Simplified processing with relaxed parameters
        ProcessingConfig fallback_config;
        fallback_config.confThreshold = 0.1f;
        fallback_config.nmsIouThreshold = 0.8f;
        fallback_config.enable_simd = false;

        return basicProcessor(ctx, outputs, &fallback_config);
    }
};

Debug Tools

#include "debug/visualization.h"
#include "debug/tensor_inspector.h"

class ProcessingDebugger {
public:
    static void debugDetections(const cvec& detections, 
                               const cv::Mat& image,
                               const std::string& save_path) {

        cv::Mat debug_image = image.clone();

        for (const auto& det : detections) {
            auto detection = extractDetection(det);

            // Draw bounding box
            cv::rectangle(debug_image, detection.bbox, cv::Scalar(0, 255, 0), 2);

            // Draw confidence and class
            std::string label = std::to_string(detection.class_id) + 
                               ": " + std::to_string(detection.confidence);
            cv::putText(debug_image, label, 
                       cv::Point(detection.bbox.x, detection.bbox.y - 10),
                       cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 255, 0), 1);
        }

        cv::imwrite(save_path, debug_image);
    }

    static void inspectTensor(const xt::xarray<float>& tensor, 
                             const std::string& name) {

        PLOG_INFO << "Tensor " << name << " inspection:";
        PLOG_INFO << "  Shape: " << xt::adapt(tensor.shape());
        PLOG_INFO << "  Min: " << xt::amin(tensor)[0];
        PLOG_INFO << "  Max: " << xt::amax(tensor)[0];
        PLOG_INFO << "  Mean: " << xt::mean(tensor)[0];
        PLOG_INFO << "  Std: " << xt::stddev(tensor)[0];

        // Check for NaN or infinity values
        bool has_nan = xt::any(xt::isnan(tensor));
        bool has_inf = xt::any(xt::isinf(tensor));

        if (has_nan) PLOG_WARNING << "  Contains NaN values!";
        if (has_inf) PLOG_WARNING << "  Contains infinite values!";
    }
};

Best Practices

Configuration Management

  1. Threshold Tuning: Start with default values and adjust based on precision/recall requirements
  2. Class Filtering: Use class filtering to improve performance when only specific objects are needed
  3. Memory Management: Enable caching and SIMD optimization for production deployments
  4. Profiling: Use Tracy profiling to identify performance bottlenecks

Integration Guidelines

  1. Error Handling: Implement comprehensive error handling with fallback mechanisms
  2. Performance Monitoring: Track processing times and memory usage in production
  3. Model Compatibility: Verify tensor shapes and formats match expected model outputs
  4. Testing: Validate processing accuracy across different model architectures

Performance Optimization

  1. Batch Processing: Use batch processing when handling multiple images simultaneously
  2. Parallel Processing: Leverage multi-threading for independent processing tasks
  3. Memory Pooling: Implement tensor memory pooling for high-throughput scenarios
  4. Hardware Acceleration: Utilize SIMD instructions and GPU acceleration when available

See Also