Decoders Plugin Collection ¶

Description ¶

The Decoders plugin collection provides a unified, hardware-accelerated video decoding system for CVEDIA-RT with automatic fallback capabilities. It offers comprehensive codec support across multiple hardware platforms while maintaining optimal performance through platform-specific acceleration technologies.

Key Features ¶

Multi-Platform Hardware Acceleration: Support for NVIDIA GPUs, Intel processors, Blaize AI chips, and GStreamer backends
Automatic Decoder Selection: Runtime hardware detection with intelligent fallback to software decoding
Zero-Copy Operations: Direct GPU memory access and efficient surface management
Comprehensive Codec Support: H.264, H.265/HEVC, AV1, VP8, VP9, MJPEG, and more
Performance Optimization: Configurable FPS limits, resolution constraints, and memory management
Cross-Platform Compatibility: Windows and Linux support with platform-specific optimizations

Use Cases ¶

High-Performance Video Processing: Hardware-accelerated decoding for real-time AI inference
Edge Computing: Optimized decoding for embedded systems and edge AI applications
Multi-Stream Processing: Concurrent decoding of multiple video streams
Universal Compatibility: Reliable software fallback when hardware unavailable

Requirements ¶

Hardware Requirements ¶

NVIDIA Decoder: - NVIDIA GPUs with NVDEC support (compute capability 5.2+) - CUDA Runtime and compatible drivers - NVIDIA Video Codec SDK

Intel Decoders: - Intel processors with integrated graphics (legacy or modern) - Intel Media SDK or Video Processing Library (VPL) - DirectX 11 support (Windows) or VA-API (Linux)

Blaize Decoder: - Blaize AI processors with graph streaming architecture - Blaize SDK and hardware acceleration libraries

GStreamer Decoder: - GStreamer framework 1.0+ with video plugins - Platform-specific hardware acceleration plugins (optional)

Software Decoder: - Any CPU architecture - No additional hardware requirements

Software Dependencies ¶

# Required libraries (platform-dependent)
# NVIDIA: CUDA Runtime, NVDEC
# Intel: Intel Media SDK/VPL, DirectX 11/VA-API
# GStreamer: GStreamer 1.0+, video plugins
# FFmpeg: Software decoding fallback

Configuration ¶

Basic Configuration ¶

{
  "decoder": {
    "targetFps": 30,
    "maxPixels": 2073600,
    "outputFormat": "NV12",
    "devicePreference": "nvidia.0",
    "fallbackEnabled": true
  }
}

Advanced Configuration ¶

{
  "decoder": {
    "targetFps": 60,
    "maxPixels": 8294400,
    "outputFormat": "BGR",
    "nvidia": {
      "deviceId": "nvidia.1",
      "cudaStream": true,
      "gpuMemoryPool": true
    },
    "intel": {
      "useDirectX11": true,
      "asyncDepth": 4,
      "gpuAllocation": true
    },
    "gstreamer": {
      "pipelineTemplate": "custom",
      "enableHardwareAccel": true,
      "restartOnError": true
    },
    "performance": {
      "rateLimiting": true,
      "memoryOptimization": true,
      "errorRecovery": "automatic"
    }
  }
}

Configuration Schema ¶

Parameter	Type	Default	Description
`targetFps`	integer	30	Target frames per second for rate limiting
`maxPixels`	integer	2073600	Maximum pixel count per frame (1920x1080)
`outputFormat`	string	"NV12"	Output pixel format (NV12, BGR, YUV420P)
`devicePreference`	string	"auto"	Preferred decoder device (nvidia.0, intel.0, software)
`fallbackEnabled`	boolean	true	Enable automatic fallback to software decoder
`nvidia.deviceId`	string	"nvidia.0"	NVIDIA GPU device identifier
`nvidia.cudaStream`	boolean	true	Enable CUDA stream processing
`nvidia.gpuMemoryPool`	boolean	true	Use GPU memory pool for efficiency
`intel.useDirectX11`	boolean	true	Enable DirectX 11 surface sharing (Windows)
`intel.asyncDepth`	integer	4	Asynchronous processing depth
`intel.gpuAllocation`	boolean	true	Allocate surfaces in GPU memory
`gstreamer.pipelineTemplate`	string	"default"	GStreamer pipeline template
`gstreamer.enableHardwareAccel`	boolean	true	Enable hardware acceleration plugins
`gstreamer.restartOnError`	boolean	true	Restart pipeline on errors

Decoder Implementations ¶

NVIDIA Decoder ¶

Hardware Acceleration: NVDEC with CUDA integration Supported Codecs: H.264, H.265/HEVC, MJPEG, VP8, VP9, AV1, MPEG2, VC1 Key Features: - Zero-copy GPU operations - CUDA-accelerated color space conversion - Multi-stream concurrent decoding - Direct integration with CUDA processing pipelines

Intel VPL MFX Decoder ¶

Hardware Acceleration: Intel Quick Sync Video via VPL Supported Codecs: H.264, H.265/HEVC, AV1, MJPEG, MPEG2, VC1, VP8, VP9 Key Features: - Modern Intel hardware optimization - Asynchronous processing with configurable depth - Advanced surface management - Cross-platform support (Windows/Linux)

Intel Legacy Decoder ¶

Hardware Acceleration: Intel Media SDK with DirectX 11 Supported Codecs: H.264, H.265/HEVC, AV1, MJPEG, MPEG2, VC1 Key Features: - Legacy Intel hardware support - DirectX 11 surface integration - Multi-surface allocation and management - Windows-optimized implementation

GStreamer Decoder ¶

Hardware Acceleration: Platform-dependent via GStreamer plugins Supported Codecs: Comprehensive support based on installed plugins Key Features: - Cross-platform compatibility - Pipeline-based processing with error recovery - Dynamic pipeline reconfiguration - Multi-backend hardware acceleration

Blaize Decoder ¶

Hardware Acceleration: Blaize AI processor graph streaming Supported Codecs: Multi-format via GStreamer backend Key Features: - Edge AI optimization - Memory-efficient embedded deployment - Integration with Blaize AI inference pipelines - Real-time performance tuning

namespace cvedia::rt::iface {
    class VideoDecoder {
    public:
        // Core decoding interface
        virtual expected<void> initialize() = 0;
        virtual expected<void> decode(InputBuffer const& input, OutputBuffer& output) = 0;
        virtual expected<void> flush() = 0;

        // Performance configuration
        virtual void setTargetFps(int fps) = 0;
        virtual void setMaxPixels(int pixels) = 0;

        // Format management
        virtual std::vector<CodecInfo> getSupportedCodecs() const = 0;
        virtual expected<void> setOutputFormat(PixelFormat format) = 0;

        // Device management
        virtual std::string getDeviceInfo() const = 0;
        virtual bool isHardwareAccelerated() const = 0;
    };
}

// Decoder registry for automatic selection
class VideoDecoderRegistry {
public:
    static void registerDecoder(std::string const& name, 
                               std::function<std::unique_ptr<VideoDecoder>()> factory);
    static std::unique_ptr<VideoDecoder> createDecoder(std::string const& preference = "auto");
    static std::vector<std::string> getAvailableDecoders();
};

Performance Control API ¶

// Rate limiting functionality
class RateLimiter {
public:
    void setTargetFps(double fps);
    bool shouldDrop() const;
    void recordFrame();
};

// Memory optimization
class SurfaceManager {
public:
    expected<Surface> allocateSurface(int width, int height, PixelFormat format);
    void releaseSurface(Surface& surface);
    void optimizeMemoryUsage();
};

Examples ¶

Basic Hardware-Accelerated Decoding ¶

#include "interface/videodecoder.h"
#include "videodecoder_registry.h"

// Create decoder with automatic hardware selection
auto decoder = VideoDecoderRegistry::createDecoder();
if (!decoder) {
    // Handle decoder creation failure
    return;
}

// Configure performance parameters
decoder->setTargetFps(30);
decoder->setMaxPixels(1920 * 1080);
decoder->setOutputFormat(PixelFormat::NV12);

// Initialize decoder
if (auto result = decoder->initialize(); !result) {
    // Handle initialization failure
    return;
}

// Process video frames
InputBuffer inputBuffer;
OutputBuffer outputBuffer;

while (hasMoreFrames()) {
    // Fill input buffer with encoded data
    inputBuffer = getNextEncodedFrame();

    // Decode frame
    if (auto result = decoder->decode(inputBuffer, outputBuffer); result) {
        // Process decoded frame
        processDecodedFrame(outputBuffer);
    } else {
        // Handle decoding error
        handleError(result.error());
    }
}

// Flush remaining frames
decoder->flush();

NVIDIA-Specific Configuration ¶

#include "decoders/nvidia/nvidiadecoder.h"

// Create NVIDIA decoder specifically
auto nvidiaDecoder = std::make_unique<NvidiaDecoder>();

// Configure NVIDIA-specific features
nvidiaDecoder->setDevice("nvidia.1");  // Use second GPU
nvidiaDecoder->enableCudaStream(true);
nvidiaDecoder->setGpuMemoryPool(true);

// Initialize with CUDA context
CudaContext context;
if (auto result = nvidiaDecoder->initialize(context); !result) {
    // Handle NVIDIA-specific initialization
    fallbackToSoftwareDecoder();
}

Multi-Stream Processing ¶

#include <vector>
#include <thread>

class MultiStreamDecoder {
public:
    void addStream(std::string const& streamId, std::string const& decoderPreference = "auto") {
        auto decoder = VideoDecoderRegistry::createDecoder(decoderPreference);
        if (decoder) {
            decoders_[streamId] = std::move(decoder);

            // Start processing thread for this stream
            threads_[streamId] = std::thread([this, streamId]() {
                processStream(streamId);
            });
        }
    }

private:
    std::unordered_map<std::string, std::unique_ptr<VideoDecoder>> decoders_;
    std::unordered_map<std::string, std::thread> threads_;

    void processStream(std::string const& streamId) {
        auto& decoder = decoders_[streamId];

        // Stream-specific processing loop
        while (isStreamActive(streamId)) {
            auto inputBuffer = getStreamInput(streamId);
            OutputBuffer outputBuffer;

            if (auto result = decoder->decode(inputBuffer, outputBuffer); result) {
                processStreamOutput(streamId, outputBuffer);
            }
        }
    }
};

Automatic Fallback Implementation ¶

class ResilientDecoder {
public:
    ResilientDecoder() {
        // Try hardware decoders first
        std::vector<std::string> preferences = {"nvidia.0", "intel.0", "gstreamer", "software"};

        for (auto const& pref : preferences) {
            decoder_ = VideoDecoderRegistry::createDecoder(pref);
            if (decoder_ && decoder_->initialize()) {
                currentDecoderType_ = pref;
                break;
            }
        }
    }

    expected<void> decode(InputBuffer const& input, OutputBuffer& output) {
        auto result = decoder_->decode(input, output);

        if (!result && currentDecoderType_ != "software") {
            // Hardware decoder failed, fallback to software
            decoder_ = VideoDecoderRegistry::createDecoder("software");
            if (decoder_ && decoder_->initialize()) {
                currentDecoderType_ = "software";
                result = decoder_->decode(input, output);
            }
        }

        return result;
    }

private:
    std::unique_ptr<VideoDecoder> decoder_;
    std::string currentDecoderType_;
};

Platform Compatibility ¶

Windows Support ¶

Decoder	Status	Requirements
NVIDIA	✅ Full	NVIDIA GPU + CUDA drivers
Intel Legacy	✅ Full	Intel iGPU + DirectX 11
Intel VPL MFX	✅ Full	Modern Intel CPU/GPU + VPL
GStreamer	✅ Full	GStreamer 1.0+ runtime
Blaize	✅ Full	Blaize hardware + SDK
Software	✅ Full	Any CPU

Linux Support ¶

Decoder	Status	Requirements
NVIDIA	✅ Full	NVIDIA GPU + CUDA drivers
Intel Legacy	⚠️ Limited	Intel iGPU + VA-API
Intel VPL MFX	✅ Full	Modern Intel CPU/GPU + VPL
GStreamer	✅ Full	GStreamer 1.0+ runtime
Blaize	✅ Full	Blaize hardware + SDK
Software	✅ Full	Any CPU

Performance Optimization ¶

Memory Management ¶

// Optimize memory allocation for high-throughput scenarios
decoder->setMaxPixels(1920 * 1080);  // Limit resolution for memory efficiency
decoder->enableGpuMemoryPool(true);  // Use GPU memory pool
decoder->setAsyncDepth(4);           // Configure async processing depth

Rate Limiting ¶

// Configure rate limiting to match display refresh rate
decoder->setTargetFps(60);  // Match display refresh rate

// Use RateLimiter for fine-grained control
RateLimiter rateLimiter;
rateLimiter.setTargetFps(30.0);

while (processFrames) {
    if (!rateLimiter.shouldDrop()) {
        // Process frame
        decoder->decode(input, output);
        rateLimiter.recordFrame();
    }
}

Multi-GPU Utilization ¶

// Distribute streams across multiple GPUs
std::vector<std::string> gpuDevices = {"nvidia.0", "nvidia.1", "nvidia.2"};
int currentGpu = 0;

for (auto const& stream : videoStreams) {
    auto decoder = VideoDecoderRegistry::createDecoder(gpuDevices[currentGpu]);
    streamDecoders[stream.id] = std::move(decoder);

    currentGpu = (currentGpu + 1) % gpuDevices.size();
}

Troubleshooting ¶

Common Issues ¶

Hardware Decoder Initialization Failure

// Check hardware availability before initialization
if (!decoder->isHardwareAccelerated()) {
    // Hardware not available, use software decoder
    decoder = VideoDecoderRegistry::createDecoder("software");
}

Memory Allocation Issues

// Reduce memory usage for resource-constrained environments
decoder->setMaxPixels(1280 * 720);  // Lower resolution limit
decoder->enableMemoryOptimization(true);

Multi-Stream Performance Issues

// Distribute load across multiple decoders
if (streamCount > 4) {
    // Create multiple decoder instances
    for (int i = 0; i < streamCount; i++) {
        auto decoder = VideoDecoderRegistry::createDecoder("nvidia." + std::to_string(i % gpuCount));
        decoders.push_back(std::move(decoder));
    }
}

Driver Version Compatibility

# Check NVIDIA driver version
nvidia-smi

# Check Intel graphics driver
intel_gpu_top

# Verify CUDA installation
nvcc --version

Error Recovery ¶

class ErrorRecoveryDecoder {
public:
    expected<void> decode(InputBuffer const& input, OutputBuffer& output) {
        auto result = decoder_->decode(input, output);

        if (!result) {
            errorCount_++;

            if (errorCount_ > maxErrors_) {
                // Reset decoder
                decoder_->flush();
                decoder_->initialize();
                errorCount_ = 0;
            }
        } else {
            errorCount_ = 0;  // Reset on successful decode
        }

        return result;
    }

private:
    std::unique_ptr<VideoDecoder> decoder_;
    int errorCount_ = 0;
    int maxErrors_ = 5;
};

Performance Monitoring ¶

#include <chrono>

class DecoderPerformanceMonitor {
public:
    void recordDecode(std::chrono::milliseconds duration) {
        decodeTimes_.push_back(duration);

        if (decodeTimes_.size() > 100) {
            decodeTimes_.erase(decodeTimes_.begin());
        }

        // Calculate average decode time
        auto total = std::accumulate(decodeTimes_.begin(), decodeTimes_.end(), std::chrono::milliseconds{0});
        averageDecodeTime_ = total / decodeTimes_.size();
    }

    double getCurrentFps() const {
        if (averageDecodeTime_.count() == 0) return 0.0;
        return 1000.0 / averageDecodeTime_.count();
    }

private:
    std::vector<std::chrono::milliseconds> decodeTimes_;
    std::chrono::milliseconds averageDecodeTime_{0};
};

Best Practices ¶

Decoder Selection Strategy ¶

Automatic Selection: Use VideoDecoderRegistry::createDecoder() for automatic hardware detection
Performance Priority: Prefer NVIDIA > Intel VPL > GStreamer > Software for performance
Compatibility Priority: Use GStreamer or Software decoders for maximum compatibility
Resource Management: Consider memory and power constraints in decoder selection

Error Handling ¶

Graceful Degradation: Implement automatic fallback from hardware to software decoding
Resource Cleanup: Always flush decoders before destruction
Error Recovery: Implement retry mechanisms with exponential backoff
Monitoring: Track decoder performance and error rates

Performance Optimization ¶

Memory Efficiency: Use GPU memory pools and optimize surface allocation
Rate Limiting: Match decoder output to downstream processing capabilities
Multi-Threading: Process multiple streams concurrently when possible
Hardware Utilization: Distribute load across available hardware resources

Integration Guidelines ¶

Unified Interface: Use the common VideoDecoder interface for decoder abstraction
Configuration Management: Centralize decoder configuration through CVEDIA-RT config system
Resource Sharing: Coordinate resource usage with other CVEDIA-RT components
Testing: Validate decoder functionality across target hardware platforms