Decoders Plugin Collection¶
Description¶
The Decoders plugin collection provides a unified, hardware-accelerated video decoding system for CVEDIA-RT with automatic fallback capabilities. It offers comprehensive codec support across multiple hardware platforms while maintaining optimal performance through platform-specific acceleration technologies.
Key Features¶
- Multi-Platform Hardware Acceleration: Support for NVIDIA GPUs, Intel processors, Blaize AI chips, and GStreamer backends
- Automatic Decoder Selection: Runtime hardware detection with intelligent fallback to software decoding
- Zero-Copy Operations: Direct GPU memory access and efficient surface management
- Comprehensive Codec Support: H.264, H.265/HEVC, AV1, VP8, VP9, MJPEG, and more
- Performance Optimization: Configurable FPS limits, resolution constraints, and memory management
- Cross-Platform Compatibility: Windows and Linux support with platform-specific optimizations
Use Cases¶
- High-Performance Video Processing: Hardware-accelerated decoding for real-time AI inference
- Edge Computing: Optimized decoding for embedded systems and edge AI applications
- Multi-Stream Processing: Concurrent decoding of multiple video streams
- Universal Compatibility: Reliable software fallback when hardware unavailable
Requirements¶
Hardware Requirements¶
NVIDIA Decoder: - NVIDIA GPUs with NVDEC support (compute capability 5.2+) - CUDA Runtime and compatible drivers - NVIDIA Video Codec SDK
Intel Decoders: - Intel processors with integrated graphics (legacy or modern) - Intel Media SDK or Video Processing Library (VPL) - DirectX 11 support (Windows) or VA-API (Linux)
Blaize Decoder: - Blaize AI processors with graph streaming architecture - Blaize SDK and hardware acceleration libraries
GStreamer Decoder: - GStreamer framework 1.0+ with video plugins - Platform-specific hardware acceleration plugins (optional)
Software Decoder: - Any CPU architecture - No additional hardware requirements
Software Dependencies¶
# Required libraries (platform-dependent)
# NVIDIA: CUDA Runtime, NVDEC
# Intel: Intel Media SDK/VPL, DirectX 11/VA-API
# GStreamer: GStreamer 1.0+, video plugins
# FFmpeg: Software decoding fallback
Configuration¶
Basic Configuration¶
{
"decoder": {
"targetFps": 30,
"maxPixels": 2073600,
"outputFormat": "NV12",
"devicePreference": "nvidia.0",
"fallbackEnabled": true
}
}
Advanced Configuration¶
{
"decoder": {
"targetFps": 60,
"maxPixels": 8294400,
"outputFormat": "BGR",
"nvidia": {
"deviceId": "nvidia.1",
"cudaStream": true,
"gpuMemoryPool": true
},
"intel": {
"useDirectX11": true,
"asyncDepth": 4,
"gpuAllocation": true
},
"gstreamer": {
"pipelineTemplate": "custom",
"enableHardwareAccel": true,
"restartOnError": true
},
"performance": {
"rateLimiting": true,
"memoryOptimization": true,
"errorRecovery": "automatic"
}
}
}
Configuration Schema¶
Parameter | Type | Default | Description |
---|---|---|---|
targetFps |
integer | 30 | Target frames per second for rate limiting |
maxPixels |
integer | 2073600 | Maximum pixel count per frame (1920x1080) |
outputFormat |
string | "NV12" | Output pixel format (NV12, BGR, YUV420P) |
devicePreference |
string | "auto" | Preferred decoder device (nvidia.0, intel.0, software) |
fallbackEnabled |
boolean | true | Enable automatic fallback to software decoder |
nvidia.deviceId |
string | "nvidia.0" | NVIDIA GPU device identifier |
nvidia.cudaStream |
boolean | true | Enable CUDA stream processing |
nvidia.gpuMemoryPool |
boolean | true | Use GPU memory pool for efficiency |
intel.useDirectX11 |
boolean | true | Enable DirectX 11 surface sharing (Windows) |
intel.asyncDepth |
integer | 4 | Asynchronous processing depth |
intel.gpuAllocation |
boolean | true | Allocate surfaces in GPU memory |
gstreamer.pipelineTemplate |
string | "default" | GStreamer pipeline template |
gstreamer.enableHardwareAccel |
boolean | true | Enable hardware acceleration plugins |
gstreamer.restartOnError |
boolean | true | Restart pipeline on errors |
Decoder Implementations¶
NVIDIA Decoder¶
Hardware Acceleration: NVDEC with CUDA integration Supported Codecs: H.264, H.265/HEVC, MJPEG, VP8, VP9, AV1, MPEG2, VC1 Key Features: - Zero-copy GPU operations - CUDA-accelerated color space conversion - Multi-stream concurrent decoding - Direct integration with CUDA processing pipelines
Intel VPL MFX Decoder¶
Hardware Acceleration: Intel Quick Sync Video via VPL Supported Codecs: H.264, H.265/HEVC, AV1, MJPEG, MPEG2, VC1, VP8, VP9 Key Features: - Modern Intel hardware optimization - Asynchronous processing with configurable depth - Advanced surface management - Cross-platform support (Windows/Linux)
Intel Legacy Decoder¶
Hardware Acceleration: Intel Media SDK with DirectX 11 Supported Codecs: H.264, H.265/HEVC, AV1, MJPEG, MPEG2, VC1 Key Features: - Legacy Intel hardware support - DirectX 11 surface integration - Multi-surface allocation and management - Windows-optimized implementation
GStreamer Decoder¶
Hardware Acceleration: Platform-dependent via GStreamer plugins Supported Codecs: Comprehensive support based on installed plugins Key Features: - Cross-platform compatibility - Pipeline-based processing with error recovery - Dynamic pipeline reconfiguration - Multi-backend hardware acceleration
Blaize Decoder¶
Hardware Acceleration: Blaize AI processor graph streaming Supported Codecs: Multi-format via GStreamer backend Key Features: - Edge AI optimization - Memory-efficient embedded deployment - Integration with Blaize AI inference pipelines - Real-time performance tuning
Software Decoder¶
Hardware Acceleration: None (CPU-based) Supported Codecs: H.264, H.265/HEVC, MPEG2, MPEG4, MJPEG, VC1 Key Features: - Universal platform compatibility - CPU-optimized algorithms - Multi-threaded processing - Reliable fallback option
API Reference¶
C++ API¶
All decoders implement the iface::VideoDecoder
interface:
namespace cvedia::rt::iface {
class VideoDecoder {
public:
// Core decoding interface
virtual expected<void> initialize() = 0;
virtual expected<void> decode(InputBuffer const& input, OutputBuffer& output) = 0;
virtual expected<void> flush() = 0;
// Performance configuration
virtual void setTargetFps(int fps) = 0;
virtual void setMaxPixels(int pixels) = 0;
// Format management
virtual std::vector<CodecInfo> getSupportedCodecs() const = 0;
virtual expected<void> setOutputFormat(PixelFormat format) = 0;
// Device management
virtual std::string getDeviceInfo() const = 0;
virtual bool isHardwareAccelerated() const = 0;
};
}
// Decoder registry for automatic selection
class VideoDecoderRegistry {
public:
static void registerDecoder(std::string const& name,
std::function<std::unique_ptr<VideoDecoder>()> factory);
static std::unique_ptr<VideoDecoder> createDecoder(std::string const& preference = "auto");
static std::vector<std::string> getAvailableDecoders();
};
Performance Control API¶
// Rate limiting functionality
class RateLimiter {
public:
void setTargetFps(double fps);
bool shouldDrop() const;
void recordFrame();
};
// Memory optimization
class SurfaceManager {
public:
expected<Surface> allocateSurface(int width, int height, PixelFormat format);
void releaseSurface(Surface& surface);
void optimizeMemoryUsage();
};
Examples¶
Basic Hardware-Accelerated Decoding¶
#include "interface/videodecoder.h"
#include "videodecoder_registry.h"
// Create decoder with automatic hardware selection
auto decoder = VideoDecoderRegistry::createDecoder();
if (!decoder) {
// Handle decoder creation failure
return;
}
// Configure performance parameters
decoder->setTargetFps(30);
decoder->setMaxPixels(1920 * 1080);
decoder->setOutputFormat(PixelFormat::NV12);
// Initialize decoder
if (auto result = decoder->initialize(); !result) {
// Handle initialization failure
return;
}
// Process video frames
InputBuffer inputBuffer;
OutputBuffer outputBuffer;
while (hasMoreFrames()) {
// Fill input buffer with encoded data
inputBuffer = getNextEncodedFrame();
// Decode frame
if (auto result = decoder->decode(inputBuffer, outputBuffer); result) {
// Process decoded frame
processDecodedFrame(outputBuffer);
} else {
// Handle decoding error
handleError(result.error());
}
}
// Flush remaining frames
decoder->flush();
NVIDIA-Specific Configuration¶
#include "decoders/nvidia/nvidiadecoder.h"
// Create NVIDIA decoder specifically
auto nvidiaDecoder = std::make_unique<NvidiaDecoder>();
// Configure NVIDIA-specific features
nvidiaDecoder->setDevice("nvidia.1"); // Use second GPU
nvidiaDecoder->enableCudaStream(true);
nvidiaDecoder->setGpuMemoryPool(true);
// Initialize with CUDA context
CudaContext context;
if (auto result = nvidiaDecoder->initialize(context); !result) {
// Handle NVIDIA-specific initialization
fallbackToSoftwareDecoder();
}
Multi-Stream Processing¶
#include <vector>
#include <thread>
class MultiStreamDecoder {
public:
void addStream(std::string const& streamId, std::string const& decoderPreference = "auto") {
auto decoder = VideoDecoderRegistry::createDecoder(decoderPreference);
if (decoder) {
decoders_[streamId] = std::move(decoder);
// Start processing thread for this stream
threads_[streamId] = std::thread([this, streamId]() {
processStream(streamId);
});
}
}
private:
std::unordered_map<std::string, std::unique_ptr<VideoDecoder>> decoders_;
std::unordered_map<std::string, std::thread> threads_;
void processStream(std::string const& streamId) {
auto& decoder = decoders_[streamId];
// Stream-specific processing loop
while (isStreamActive(streamId)) {
auto inputBuffer = getStreamInput(streamId);
OutputBuffer outputBuffer;
if (auto result = decoder->decode(inputBuffer, outputBuffer); result) {
processStreamOutput(streamId, outputBuffer);
}
}
}
};
Automatic Fallback Implementation¶
class ResilientDecoder {
public:
ResilientDecoder() {
// Try hardware decoders first
std::vector<std::string> preferences = {"nvidia.0", "intel.0", "gstreamer", "software"};
for (auto const& pref : preferences) {
decoder_ = VideoDecoderRegistry::createDecoder(pref);
if (decoder_ && decoder_->initialize()) {
currentDecoderType_ = pref;
break;
}
}
}
expected<void> decode(InputBuffer const& input, OutputBuffer& output) {
auto result = decoder_->decode(input, output);
if (!result && currentDecoderType_ != "software") {
// Hardware decoder failed, fallback to software
decoder_ = VideoDecoderRegistry::createDecoder("software");
if (decoder_ && decoder_->initialize()) {
currentDecoderType_ = "software";
result = decoder_->decode(input, output);
}
}
return result;
}
private:
std::unique_ptr<VideoDecoder> decoder_;
std::string currentDecoderType_;
};
Platform Compatibility¶
Windows Support¶
Decoder | Status | Requirements |
---|---|---|
NVIDIA | ✅ Full | NVIDIA GPU + CUDA drivers |
Intel Legacy | ✅ Full | Intel iGPU + DirectX 11 |
Intel VPL MFX | ✅ Full | Modern Intel CPU/GPU + VPL |
GStreamer | ✅ Full | GStreamer 1.0+ runtime |
Blaize | ✅ Full | Blaize hardware + SDK |
Software | ✅ Full | Any CPU |
Linux Support¶
Decoder | Status | Requirements |
---|---|---|
NVIDIA | ✅ Full | NVIDIA GPU + CUDA drivers |
Intel Legacy | ⚠️ Limited | Intel iGPU + VA-API |
Intel VPL MFX | ✅ Full | Modern Intel CPU/GPU + VPL |
GStreamer | ✅ Full | GStreamer 1.0+ runtime |
Blaize | ✅ Full | Blaize hardware + SDK |
Software | ✅ Full | Any CPU |
Performance Optimization¶
Memory Management¶
// Optimize memory allocation for high-throughput scenarios
decoder->setMaxPixels(1920 * 1080); // Limit resolution for memory efficiency
decoder->enableGpuMemoryPool(true); // Use GPU memory pool
decoder->setAsyncDepth(4); // Configure async processing depth
Rate Limiting¶
// Configure rate limiting to match display refresh rate
decoder->setTargetFps(60); // Match display refresh rate
// Use RateLimiter for fine-grained control
RateLimiter rateLimiter;
rateLimiter.setTargetFps(30.0);
while (processFrames) {
if (!rateLimiter.shouldDrop()) {
// Process frame
decoder->decode(input, output);
rateLimiter.recordFrame();
}
}
Multi-GPU Utilization¶
// Distribute streams across multiple GPUs
std::vector<std::string> gpuDevices = {"nvidia.0", "nvidia.1", "nvidia.2"};
int currentGpu = 0;
for (auto const& stream : videoStreams) {
auto decoder = VideoDecoderRegistry::createDecoder(gpuDevices[currentGpu]);
streamDecoders[stream.id] = std::move(decoder);
currentGpu = (currentGpu + 1) % gpuDevices.size();
}
Troubleshooting¶
Common Issues¶
Hardware Decoder Initialization Failure
// Check hardware availability before initialization
if (!decoder->isHardwareAccelerated()) {
// Hardware not available, use software decoder
decoder = VideoDecoderRegistry::createDecoder("software");
}
Memory Allocation Issues
// Reduce memory usage for resource-constrained environments
decoder->setMaxPixels(1280 * 720); // Lower resolution limit
decoder->enableMemoryOptimization(true);
Multi-Stream Performance Issues
// Distribute load across multiple decoders
if (streamCount > 4) {
// Create multiple decoder instances
for (int i = 0; i < streamCount; i++) {
auto decoder = VideoDecoderRegistry::createDecoder("nvidia." + std::to_string(i % gpuCount));
decoders.push_back(std::move(decoder));
}
}
Driver Version Compatibility
# Check NVIDIA driver version
nvidia-smi
# Check Intel graphics driver
intel_gpu_top
# Verify CUDA installation
nvcc --version
Error Recovery¶
class ErrorRecoveryDecoder {
public:
expected<void> decode(InputBuffer const& input, OutputBuffer& output) {
auto result = decoder_->decode(input, output);
if (!result) {
errorCount_++;
if (errorCount_ > maxErrors_) {
// Reset decoder
decoder_->flush();
decoder_->initialize();
errorCount_ = 0;
}
} else {
errorCount_ = 0; // Reset on successful decode
}
return result;
}
private:
std::unique_ptr<VideoDecoder> decoder_;
int errorCount_ = 0;
int maxErrors_ = 5;
};
Performance Monitoring¶
#include <chrono>
class DecoderPerformanceMonitor {
public:
void recordDecode(std::chrono::milliseconds duration) {
decodeTimes_.push_back(duration);
if (decodeTimes_.size() > 100) {
decodeTimes_.erase(decodeTimes_.begin());
}
// Calculate average decode time
auto total = std::accumulate(decodeTimes_.begin(), decodeTimes_.end(), std::chrono::milliseconds{0});
averageDecodeTime_ = total / decodeTimes_.size();
}
double getCurrentFps() const {
if (averageDecodeTime_.count() == 0) return 0.0;
return 1000.0 / averageDecodeTime_.count();
}
private:
std::vector<std::chrono::milliseconds> decodeTimes_;
std::chrono::milliseconds averageDecodeTime_{0};
};
Best Practices¶
Decoder Selection Strategy¶
- Automatic Selection: Use
VideoDecoderRegistry::createDecoder()
for automatic hardware detection - Performance Priority: Prefer NVIDIA > Intel VPL > GStreamer > Software for performance
- Compatibility Priority: Use GStreamer or Software decoders for maximum compatibility
- Resource Management: Consider memory and power constraints in decoder selection
Error Handling¶
- Graceful Degradation: Implement automatic fallback from hardware to software decoding
- Resource Cleanup: Always flush decoders before destruction
- Error Recovery: Implement retry mechanisms with exponential backoff
- Monitoring: Track decoder performance and error rates
Performance Optimization¶
- Memory Efficiency: Use GPU memory pools and optimize surface allocation
- Rate Limiting: Match decoder output to downstream processing capabilities
- Multi-Threading: Process multiple streams concurrently when possible
- Hardware Utilization: Distribute load across available hardware resources
Integration Guidelines¶
- Unified Interface: Use the common
VideoDecoder
interface for decoder abstraction - Configuration Management: Centralize decoder configuration through CVEDIA-RT config system
- Resource Sharing: Coordinate resource usage with other CVEDIA-RT components
- Testing: Validate decoder functionality across target hardware platforms