Skip to content

Decoders Plugin Collection

Description

The Decoders plugin collection provides a unified, hardware-accelerated video decoding system for CVEDIA-RT with automatic fallback capabilities. It offers comprehensive codec support across multiple hardware platforms while maintaining optimal performance through platform-specific acceleration technologies.

Key Features

  • Multi-Platform Hardware Acceleration: Support for NVIDIA GPUs, Intel processors, Blaize AI chips, and GStreamer backends
  • Automatic Decoder Selection: Runtime hardware detection with intelligent fallback to software decoding
  • Zero-Copy Operations: Direct GPU memory access and efficient surface management
  • Comprehensive Codec Support: H.264, H.265/HEVC, AV1, VP8, VP9, MJPEG, and more
  • Performance Optimization: Configurable FPS limits, resolution constraints, and memory management
  • Cross-Platform Compatibility: Windows and Linux support with platform-specific optimizations

Use Cases

  • High-Performance Video Processing: Hardware-accelerated decoding for real-time AI inference
  • Edge Computing: Optimized decoding for embedded systems and edge AI applications
  • Multi-Stream Processing: Concurrent decoding of multiple video streams
  • Universal Compatibility: Reliable software fallback when hardware unavailable

Requirements

Hardware Requirements

NVIDIA Decoder: - NVIDIA GPUs with NVDEC support (compute capability 5.2+) - CUDA Runtime and compatible drivers - NVIDIA Video Codec SDK

Intel Decoders: - Intel processors with integrated graphics (legacy or modern) - Intel Media SDK or Video Processing Library (VPL) - DirectX 11 support (Windows) or VA-API (Linux)

Blaize Decoder: - Blaize AI processors with graph streaming architecture - Blaize SDK and hardware acceleration libraries

GStreamer Decoder: - GStreamer framework 1.0+ with video plugins - Platform-specific hardware acceleration plugins (optional)

Software Decoder: - Any CPU architecture - No additional hardware requirements

Software Dependencies

# Required libraries (platform-dependent)
# NVIDIA: CUDA Runtime, NVDEC
# Intel: Intel Media SDK/VPL, DirectX 11/VA-API
# GStreamer: GStreamer 1.0+, video plugins
# FFmpeg: Software decoding fallback

Configuration

Basic Configuration

{
  "decoder": {
    "targetFps": 30,
    "maxPixels": 2073600,
    "outputFormat": "NV12",
    "devicePreference": "nvidia.0",
    "fallbackEnabled": true
  }
}

Advanced Configuration

{
  "decoder": {
    "targetFps": 60,
    "maxPixels": 8294400,
    "outputFormat": "BGR",
    "nvidia": {
      "deviceId": "nvidia.1",
      "cudaStream": true,
      "gpuMemoryPool": true
    },
    "intel": {
      "useDirectX11": true,
      "asyncDepth": 4,
      "gpuAllocation": true
    },
    "gstreamer": {
      "pipelineTemplate": "custom",
      "enableHardwareAccel": true,
      "restartOnError": true
    },
    "performance": {
      "rateLimiting": true,
      "memoryOptimization": true,
      "errorRecovery": "automatic"
    }
  }
}

Configuration Schema

Parameter Type Default Description
targetFps integer 30 Target frames per second for rate limiting
maxPixels integer 2073600 Maximum pixel count per frame (1920x1080)
outputFormat string "NV12" Output pixel format (NV12, BGR, YUV420P)
devicePreference string "auto" Preferred decoder device (nvidia.0, intel.0, software)
fallbackEnabled boolean true Enable automatic fallback to software decoder
nvidia.deviceId string "nvidia.0" NVIDIA GPU device identifier
nvidia.cudaStream boolean true Enable CUDA stream processing
nvidia.gpuMemoryPool boolean true Use GPU memory pool for efficiency
intel.useDirectX11 boolean true Enable DirectX 11 surface sharing (Windows)
intel.asyncDepth integer 4 Asynchronous processing depth
intel.gpuAllocation boolean true Allocate surfaces in GPU memory
gstreamer.pipelineTemplate string "default" GStreamer pipeline template
gstreamer.enableHardwareAccel boolean true Enable hardware acceleration plugins
gstreamer.restartOnError boolean true Restart pipeline on errors

Decoder Implementations

NVIDIA Decoder

Hardware Acceleration: NVDEC with CUDA integration Supported Codecs: H.264, H.265/HEVC, MJPEG, VP8, VP9, AV1, MPEG2, VC1 Key Features: - Zero-copy GPU operations - CUDA-accelerated color space conversion - Multi-stream concurrent decoding - Direct integration with CUDA processing pipelines

Intel VPL MFX Decoder

Hardware Acceleration: Intel Quick Sync Video via VPL Supported Codecs: H.264, H.265/HEVC, AV1, MJPEG, MPEG2, VC1, VP8, VP9 Key Features: - Modern Intel hardware optimization - Asynchronous processing with configurable depth - Advanced surface management - Cross-platform support (Windows/Linux)

Intel Legacy Decoder

Hardware Acceleration: Intel Media SDK with DirectX 11 Supported Codecs: H.264, H.265/HEVC, AV1, MJPEG, MPEG2, VC1 Key Features: - Legacy Intel hardware support - DirectX 11 surface integration - Multi-surface allocation and management - Windows-optimized implementation

GStreamer Decoder

Hardware Acceleration: Platform-dependent via GStreamer plugins Supported Codecs: Comprehensive support based on installed plugins Key Features: - Cross-platform compatibility - Pipeline-based processing with error recovery - Dynamic pipeline reconfiguration - Multi-backend hardware acceleration

Blaize Decoder

Hardware Acceleration: Blaize AI processor graph streaming Supported Codecs: Multi-format via GStreamer backend Key Features: - Edge AI optimization - Memory-efficient embedded deployment - Integration with Blaize AI inference pipelines - Real-time performance tuning

Software Decoder

Hardware Acceleration: None (CPU-based) Supported Codecs: H.264, H.265/HEVC, MPEG2, MPEG4, MJPEG, VC1 Key Features: - Universal platform compatibility - CPU-optimized algorithms - Multi-threaded processing - Reliable fallback option

API Reference

C++ API

All decoders implement the iface::VideoDecoder interface:

namespace cvedia::rt::iface {
    class VideoDecoder {
    public:
        // Core decoding interface
        virtual expected<void> initialize() = 0;
        virtual expected<void> decode(InputBuffer const& input, OutputBuffer& output) = 0;
        virtual expected<void> flush() = 0;

        // Performance configuration
        virtual void setTargetFps(int fps) = 0;
        virtual void setMaxPixels(int pixels) = 0;

        // Format management
        virtual std::vector<CodecInfo> getSupportedCodecs() const = 0;
        virtual expected<void> setOutputFormat(PixelFormat format) = 0;

        // Device management
        virtual std::string getDeviceInfo() const = 0;
        virtual bool isHardwareAccelerated() const = 0;
    };
}

// Decoder registry for automatic selection
class VideoDecoderRegistry {
public:
    static void registerDecoder(std::string const& name, 
                               std::function<std::unique_ptr<VideoDecoder>()> factory);
    static std::unique_ptr<VideoDecoder> createDecoder(std::string const& preference = "auto");
    static std::vector<std::string> getAvailableDecoders();
};

Performance Control API

// Rate limiting functionality
class RateLimiter {
public:
    void setTargetFps(double fps);
    bool shouldDrop() const;
    void recordFrame();
};

// Memory optimization
class SurfaceManager {
public:
    expected<Surface> allocateSurface(int width, int height, PixelFormat format);
    void releaseSurface(Surface& surface);
    void optimizeMemoryUsage();
};

Examples

Basic Hardware-Accelerated Decoding

#include "interface/videodecoder.h"
#include "videodecoder_registry.h"

// Create decoder with automatic hardware selection
auto decoder = VideoDecoderRegistry::createDecoder();
if (!decoder) {
    // Handle decoder creation failure
    return;
}

// Configure performance parameters
decoder->setTargetFps(30);
decoder->setMaxPixels(1920 * 1080);
decoder->setOutputFormat(PixelFormat::NV12);

// Initialize decoder
if (auto result = decoder->initialize(); !result) {
    // Handle initialization failure
    return;
}

// Process video frames
InputBuffer inputBuffer;
OutputBuffer outputBuffer;

while (hasMoreFrames()) {
    // Fill input buffer with encoded data
    inputBuffer = getNextEncodedFrame();

    // Decode frame
    if (auto result = decoder->decode(inputBuffer, outputBuffer); result) {
        // Process decoded frame
        processDecodedFrame(outputBuffer);
    } else {
        // Handle decoding error
        handleError(result.error());
    }
}

// Flush remaining frames
decoder->flush();

NVIDIA-Specific Configuration

#include "decoders/nvidia/nvidiadecoder.h"

// Create NVIDIA decoder specifically
auto nvidiaDecoder = std::make_unique<NvidiaDecoder>();

// Configure NVIDIA-specific features
nvidiaDecoder->setDevice("nvidia.1");  // Use second GPU
nvidiaDecoder->enableCudaStream(true);
nvidiaDecoder->setGpuMemoryPool(true);

// Initialize with CUDA context
CudaContext context;
if (auto result = nvidiaDecoder->initialize(context); !result) {
    // Handle NVIDIA-specific initialization
    fallbackToSoftwareDecoder();
}

Multi-Stream Processing

#include <vector>
#include <thread>

class MultiStreamDecoder {
public:
    void addStream(std::string const& streamId, std::string const& decoderPreference = "auto") {
        auto decoder = VideoDecoderRegistry::createDecoder(decoderPreference);
        if (decoder) {
            decoders_[streamId] = std::move(decoder);

            // Start processing thread for this stream
            threads_[streamId] = std::thread([this, streamId]() {
                processStream(streamId);
            });
        }
    }

private:
    std::unordered_map<std::string, std::unique_ptr<VideoDecoder>> decoders_;
    std::unordered_map<std::string, std::thread> threads_;

    void processStream(std::string const& streamId) {
        auto& decoder = decoders_[streamId];

        // Stream-specific processing loop
        while (isStreamActive(streamId)) {
            auto inputBuffer = getStreamInput(streamId);
            OutputBuffer outputBuffer;

            if (auto result = decoder->decode(inputBuffer, outputBuffer); result) {
                processStreamOutput(streamId, outputBuffer);
            }
        }
    }
};

Automatic Fallback Implementation

class ResilientDecoder {
public:
    ResilientDecoder() {
        // Try hardware decoders first
        std::vector<std::string> preferences = {"nvidia.0", "intel.0", "gstreamer", "software"};

        for (auto const& pref : preferences) {
            decoder_ = VideoDecoderRegistry::createDecoder(pref);
            if (decoder_ && decoder_->initialize()) {
                currentDecoderType_ = pref;
                break;
            }
        }
    }

    expected<void> decode(InputBuffer const& input, OutputBuffer& output) {
        auto result = decoder_->decode(input, output);

        if (!result && currentDecoderType_ != "software") {
            // Hardware decoder failed, fallback to software
            decoder_ = VideoDecoderRegistry::createDecoder("software");
            if (decoder_ && decoder_->initialize()) {
                currentDecoderType_ = "software";
                result = decoder_->decode(input, output);
            }
        }

        return result;
    }

private:
    std::unique_ptr<VideoDecoder> decoder_;
    std::string currentDecoderType_;
};

Platform Compatibility

Windows Support

Decoder Status Requirements
NVIDIA ✅ Full NVIDIA GPU + CUDA drivers
Intel Legacy ✅ Full Intel iGPU + DirectX 11
Intel VPL MFX ✅ Full Modern Intel CPU/GPU + VPL
GStreamer ✅ Full GStreamer 1.0+ runtime
Blaize ✅ Full Blaize hardware + SDK
Software ✅ Full Any CPU

Linux Support

Decoder Status Requirements
NVIDIA ✅ Full NVIDIA GPU + CUDA drivers
Intel Legacy ⚠️ Limited Intel iGPU + VA-API
Intel VPL MFX ✅ Full Modern Intel CPU/GPU + VPL
GStreamer ✅ Full GStreamer 1.0+ runtime
Blaize ✅ Full Blaize hardware + SDK
Software ✅ Full Any CPU

Performance Optimization

Memory Management

// Optimize memory allocation for high-throughput scenarios
decoder->setMaxPixels(1920 * 1080);  // Limit resolution for memory efficiency
decoder->enableGpuMemoryPool(true);  // Use GPU memory pool
decoder->setAsyncDepth(4);           // Configure async processing depth

Rate Limiting

// Configure rate limiting to match display refresh rate
decoder->setTargetFps(60);  // Match display refresh rate

// Use RateLimiter for fine-grained control
RateLimiter rateLimiter;
rateLimiter.setTargetFps(30.0);

while (processFrames) {
    if (!rateLimiter.shouldDrop()) {
        // Process frame
        decoder->decode(input, output);
        rateLimiter.recordFrame();
    }
}

Multi-GPU Utilization

// Distribute streams across multiple GPUs
std::vector<std::string> gpuDevices = {"nvidia.0", "nvidia.1", "nvidia.2"};
int currentGpu = 0;

for (auto const& stream : videoStreams) {
    auto decoder = VideoDecoderRegistry::createDecoder(gpuDevices[currentGpu]);
    streamDecoders[stream.id] = std::move(decoder);

    currentGpu = (currentGpu + 1) % gpuDevices.size();
}

Troubleshooting

Common Issues

Hardware Decoder Initialization Failure

// Check hardware availability before initialization
if (!decoder->isHardwareAccelerated()) {
    // Hardware not available, use software decoder
    decoder = VideoDecoderRegistry::createDecoder("software");
}

Memory Allocation Issues

// Reduce memory usage for resource-constrained environments
decoder->setMaxPixels(1280 * 720);  // Lower resolution limit
decoder->enableMemoryOptimization(true);

Multi-Stream Performance Issues

// Distribute load across multiple decoders
if (streamCount > 4) {
    // Create multiple decoder instances
    for (int i = 0; i < streamCount; i++) {
        auto decoder = VideoDecoderRegistry::createDecoder("nvidia." + std::to_string(i % gpuCount));
        decoders.push_back(std::move(decoder));
    }
}

Driver Version Compatibility

# Check NVIDIA driver version
nvidia-smi

# Check Intel graphics driver
intel_gpu_top

# Verify CUDA installation
nvcc --version

Error Recovery

class ErrorRecoveryDecoder {
public:
    expected<void> decode(InputBuffer const& input, OutputBuffer& output) {
        auto result = decoder_->decode(input, output);

        if (!result) {
            errorCount_++;

            if (errorCount_ > maxErrors_) {
                // Reset decoder
                decoder_->flush();
                decoder_->initialize();
                errorCount_ = 0;
            }
        } else {
            errorCount_ = 0;  // Reset on successful decode
        }

        return result;
    }

private:
    std::unique_ptr<VideoDecoder> decoder_;
    int errorCount_ = 0;
    int maxErrors_ = 5;
};

Performance Monitoring

#include <chrono>

class DecoderPerformanceMonitor {
public:
    void recordDecode(std::chrono::milliseconds duration) {
        decodeTimes_.push_back(duration);

        if (decodeTimes_.size() > 100) {
            decodeTimes_.erase(decodeTimes_.begin());
        }

        // Calculate average decode time
        auto total = std::accumulate(decodeTimes_.begin(), decodeTimes_.end(), std::chrono::milliseconds{0});
        averageDecodeTime_ = total / decodeTimes_.size();
    }

    double getCurrentFps() const {
        if (averageDecodeTime_.count() == 0) return 0.0;
        return 1000.0 / averageDecodeTime_.count();
    }

private:
    std::vector<std::chrono::milliseconds> decodeTimes_;
    std::chrono::milliseconds averageDecodeTime_{0};
};

Best Practices

Decoder Selection Strategy

  1. Automatic Selection: Use VideoDecoderRegistry::createDecoder() for automatic hardware detection
  2. Performance Priority: Prefer NVIDIA > Intel VPL > GStreamer > Software for performance
  3. Compatibility Priority: Use GStreamer or Software decoders for maximum compatibility
  4. Resource Management: Consider memory and power constraints in decoder selection

Error Handling

  1. Graceful Degradation: Implement automatic fallback from hardware to software decoding
  2. Resource Cleanup: Always flush decoders before destruction
  3. Error Recovery: Implement retry mechanisms with exponential backoff
  4. Monitoring: Track decoder performance and error rates

Performance Optimization

  1. Memory Efficiency: Use GPU memory pools and optimize surface allocation
  2. Rate Limiting: Match decoder output to downstream processing capabilities
  3. Multi-Threading: Process multiple streams concurrently when possible
  4. Hardware Utilization: Distribute load across available hardware resources

Integration Guidelines

  1. Unified Interface: Use the common VideoDecoder interface for decoder abstraction
  2. Configuration Management: Centralize decoder configuration through CVEDIA-RT config system
  3. Resource Sharing: Coordinate resource usage with other CVEDIA-RT components
  4. Testing: Validate decoder functionality across target hardware platforms

See Also