JetsonUtils Plugin¶
Description¶
JetsonUtils is a specialized utility plugin designed for NVIDIA Jetson platforms. It provides hardware-specific optimizations, utility functions, and platform integration capabilities to leverage the unique features of Jetson embedded AI computing devices.
This plugin enables direct access to Jetson-specific hardware features including GPU memory management, CUDA acceleration, power management, thermal monitoring, and platform detection. It serves as the foundation for optimal performance on Jetson embedded AI systems.
Key Features¶
- Jetson Hardware Integration: Direct integration with Jetson-specific hardware features and capabilities
- GPU Memory Management: Optimized CUDA memory allocation, deallocation, and unified memory support
- Hardware Acceleration: Access to Jetson's specialized AI acceleration units (DLA, GPU, VIC)
- Power Management: Integration with Jetson power management systems and NVPMODEL utilities
- Performance Optimization: Platform-specific performance tuning and optimization utilities
- Thermal Management: Hardware thermal monitoring, control, and throttling management
- Platform Detection: Automatic Jetson platform detection and configuration (Nano, TX2, Xavier, Orin)
- CUDA Integration: Deep CUDA runtime integration for GPU-accelerated operations
- Memory Optimization: Efficient memory allocation strategies for embedded AI workloads
- System Monitoring: Real-time monitoring of hardware resources and performance metrics
Requirements¶
Hardware Requirements¶
- NVIDIA Jetson Platform: Jetson Nano, TX1, TX2, Xavier NX, AGX Xavier, Orin Nano, AGX Orin
- GPU Memory: Sufficient GPU memory for CUDA operations (varies by platform)
- System Memory: Minimum 2GB RAM (4GB+ recommended for complex operations)
- Storage: Access to CUDA libraries and Jetson system utilities
Software Dependencies¶
- JetPack SDK: NVIDIA JetPack with appropriate version for target platform
- CUDA Runtime: NVIDIA CUDA runtime libraries (typically bundled with JetPack)
- cuDNN: NVIDIA cuDNN library for deep learning acceleration
- L4T (Linux for Tegra): Jetson Linux operating system
- NVML: NVIDIA Management Library for hardware monitoring
- TensorRT: NVIDIA TensorRT for optimized inference (recommended)
- Jetson Libraries: Platform-specific Jetson libraries and utilities
Platform-Specific Requirements¶
Jetson Nano/TX1¶
- Maxwell GPU architecture support
- 4GB shared memory configuration
- CUDA Compute Capability 5.3
Jetson TX2¶
- Pascal GPU architecture support
- 8GB shared memory configuration
- CUDA Compute Capability 6.2
Jetson Xavier Series¶
- Volta GPU architecture support
- DLA (Deep Learning Accelerator) hardware
- CUDA Compute Capability 7.2
Jetson Orin Series¶
- Ampere GPU architecture support
- Latest DLA hardware acceleration
- CUDA Compute Capability 8.7
Configuration¶
Basic Configuration¶
{
"jetsonutils": {
"platform_detection": true,
"gpu_memory_fraction": 0.8,
"power_mode": "auto",
"thermal_monitoring": true,
"cuda_device": 0
}
}
Advanced Configuration¶
{
"jetsonutils": {
"platform_detection": true,
"gpu_memory_fraction": 0.75,
"power_mode": "MAXN",
"thermal_monitoring": true,
"thermal_threshold": 80.0,
"cuda_device": 0,
"memory_management": {
"unified_memory": true,
"memory_pool_size": "512MB",
"cache_config": "prefer_shared"
},
"performance_optimization": {
"cpu_governor": "performance",
"gpu_clock_boost": true,
"dla_enabled": true,
"vic_enabled": true
},
"monitoring": {
"enable_profiling": true,
"sample_interval": 1000,
"log_performance": true
}
}
}
Configuration Schema¶
Parameter | Type | Default | Description |
---|---|---|---|
platform_detection |
bool | true | Enable automatic Jetson platform detection |
gpu_memory_fraction |
float | 0.8 | Fraction of GPU memory to allocate (0.1-1.0) |
power_mode |
string | "auto" | Power mode ("auto", "MAXN", "5W", "10W", "15W", "30W") |
thermal_monitoring |
bool | true | Enable thermal monitoring and protection |
thermal_threshold |
float | 85.0 | Temperature threshold for thermal throttling (°C) |
cuda_device |
int | 0 | CUDA device ID to use |
unified_memory |
bool | true | Enable CUDA unified memory management |
memory_pool_size |
string | "256MB" | Memory pool size for efficient allocation |
cache_config |
string | "prefer_shared" | CUDA cache configuration |
cpu_governor |
string | "ondemand" | CPU frequency governor |
gpu_clock_boost |
bool | false | Enable GPU clock boosting |
dla_enabled |
bool | true | Enable DLA (Deep Learning Accelerator) |
vic_enabled |
bool | true | Enable VIC (Video Image Compositor) |
enable_profiling |
bool | false | Enable performance profiling |
sample_interval |
int | 1000 | Monitoring sample interval (ms) |
log_performance |
bool | false | Log performance metrics |
API Reference¶
C++ API (JetsonUtils)¶
Platform Detection and Information¶
class JetsonUtils {
public:
// Platform detection
static JetsonPlatform detectPlatform();
static std::string getPlatformName();
static std::string getJetPackVersion();
static std::string getL4TVersion();
// Hardware capabilities
static bool hasDLA();
static bool hasVIC();
static int getGPUArchitecture();
static size_t getTotalMemory();
static size_t getGPUMemory();
};
GPU Memory Management¶
class JetsonUtils {
public:
// Memory allocation
static expected<void*> allocateGPUMemory(size_t size);
static expected<void*> allocateUnifiedMemory(size_t size);
static expected<void> freeGPUMemory(void* ptr);
// Memory information
static expected<MemoryInfo> getMemoryInfo();
static expected<size_t> getAvailableGPUMemory();
static expected<float> getMemoryUtilization();
// Memory optimization
static expected<void> optimizeMemoryUsage();
static expected<void> clearMemoryCache();
};
struct MemoryInfo {
size_t total_memory;
size_t free_memory;
size_t used_memory;
float utilization_percent;
};
Power and Thermal Management¶
class JetsonUtils {
public:
// Power management
static expected<void> setPowerMode(PowerMode mode);
static expected<PowerMode> getCurrentPowerMode();
static expected<std::vector<PowerMode>> getAvailablePowerModes();
// Thermal monitoring
static expected<float> getCPUTemperature();
static expected<float> getGPUTemperature();
static expected<ThermalInfo> getThermalInfo();
static expected<void> setThermalThreshold(float threshold);
// Performance monitoring
static expected<PerformanceInfo> getPerformanceInfo();
static expected<void> enablePerformanceMonitoring(bool enable);
};
enum class PowerMode {
Mode5W, Mode10W, Mode15W, Mode30W, MAXN
};
struct ThermalInfo {
float cpu_temp;
float gpu_temp;
float thermal_throttling;
bool is_throttling;
};
struct PerformanceInfo {
float cpu_usage;
float gpu_usage;
float memory_usage;
int cpu_frequency;
int gpu_frequency;
};
CUDA Integration¶
class JetsonUtils {
public:
// CUDA device management
static expected<void> initializeCUDA();
static expected<void> setCUDADevice(int device);
static expected<int> getCUDADevice();
static expected<cudaDeviceProp> getCUDADeviceProperties();
// CUDA memory operations
static expected<void> cudaMemcpyAsync(void* dst, const void* src, size_t count,
cudaMemcpyKind kind, cudaStream_t stream);
static expected<cudaStream_t> createCUDAStream();
static expected<void> destroyCUDAStream(cudaStream_t stream);
// CUDA performance
static expected<void> cudaSynchronize();
static expected<float> measureCUDAKernelTime(std::function<void()> kernel);
};
Lua API¶
Platform Information¶
-- Create JetsonUtils instance
local jetson = api.factory.jetsonutils.create(instance, "jetson_platform")
-- Get platform information
local platform = jetson:detectPlatform()
local platformName = jetson:getPlatformName()
local jetpackVersion = jetson:getJetPackVersion()
print("Platform:", platformName)
print("JetPack Version:", jetpackVersion)
print("Has DLA:", jetson:hasDLA())
print("Has VIC:", jetson:hasVIC())
Memory Management¶
-- Memory information
local memInfo = jetson:getMemoryInfo()
print(string.format("Memory: %.1f%% used (%.1fMB / %.1fMB)",
memInfo.utilization_percent,
memInfo.used_memory / 1024 / 1024,
memInfo.total_memory / 1024 / 1024))
-- Optimize memory usage
jetson:optimizeMemoryUsage()
jetson:clearMemoryCache()
Power and Performance¶
-- Set power mode for optimal performance
jetson:setPowerMode("MAXN")
-- Monitor thermal conditions
local thermal = jetson:getThermalInfo()
print(string.format("Temperatures - CPU: %.1f°C, GPU: %.1f°C",
thermal.cpu_temp, thermal.gpu_temp))
if thermal.is_throttling then
print("WARNING: Thermal throttling active")
end
-- Performance monitoring
local perf = jetson:getPerformanceInfo()
print(string.format("Performance - CPU: %.1f%%, GPU: %.1f%%, Memory: %.1f%%",
perf.cpu_usage, perf.gpu_usage, perf.memory_usage))
Examples¶
Basic Jetson Platform Initialization¶
#include "jetsonutils.h"
#include "jetsonutilsmanaged.h"
// Initialize Jetson platform optimizations
class JetsonPlatformManager {
public:
void initialize() {
// Detect platform and capabilities
auto platform = JetsonUtils::detectPlatform();
auto platformName = JetsonUtils::getPlatformName();
LOGI << "Detected Jetson Platform: " << platformName;
// Check hardware capabilities
bool hasDLA = JetsonUtils::hasDLA();
bool hasVIC = JetsonUtils::hasVIC();
int gpuArch = JetsonUtils::getGPUArchitecture();
LOGI << "Hardware Capabilities:";
LOGI << " DLA Support: " << (hasDLA ? "Yes" : "No");
LOGI << " VIC Support: " << (hasVIC ? "Yes" : "No");
LOGI << " GPU Architecture: " << gpuArch;
// Initialize CUDA
auto cudaResult = JetsonUtils::initializeCUDA();
if (!cudaResult) {
LOGE << "Failed to initialize CUDA: " << cudaResult.error().message();
return;
}
// Set optimal power mode
auto powerResult = JetsonUtils::setPowerMode(PowerMode::MAXN);
if (!powerResult) {
LOGW << "Failed to set MAXN power mode: " << powerResult.error().message();
}
// Configure memory management
configureMemoryManagement();
// Start thermal monitoring
startThermalMonitoring();
LOGI << "Jetson platform initialized successfully";
}
void configureMemoryManagement() {
// Get memory information
auto memInfo = JetsonUtils::getMemoryInfo();
if (memInfo) {
LOGI << "Memory Status:";
LOGI << " Total: " << memInfo->total_memory / 1024 / 1024 << " MB";
LOGI << " Free: " << memInfo->free_memory / 1024 / 1024 << " MB";
LOGI << " Utilization: " << memInfo->utilization_percent << "%";
}
// Optimize memory usage
JetsonUtils::optimizeMemoryUsage();
}
void startThermalMonitoring() {
// Enable thermal monitoring
JetsonUtils::setThermalThreshold(80.0f); // 80°C threshold
// Start monitoring thread
thermalMonitorThread_ = std::thread([this]() {
monitorThermalConditions();
});
}
private:
std::thread thermalMonitorThread_;
void monitorThermalConditions() {
while (running_) {
auto thermal = JetsonUtils::getThermalInfo();
if (thermal) {
if (thermal->is_throttling) {
LOGW << "Thermal throttling active - CPU: " << thermal->cpu_temp
<< "°C, GPU: " << thermal->gpu_temp << "°C";
}
// Adjust performance based on temperature
if (thermal->cpu_temp > 85.0f || thermal->gpu_temp > 85.0f) {
// Switch to lower power mode
JetsonUtils::setPowerMode(PowerMode::Mode15W);
}
}
std::this_thread::sleep_for(std::chrono::seconds(5));
}
}
std::atomic<bool> running_{true};
};
GPU Memory Management and CUDA Operations¶
// Efficient GPU memory management for AI workloads
class JetsonMemoryManager {
public:
void initializeMemoryPools() {
// Allocate memory pools for different purposes
allocateInferenceMemoryPool();
allocateImageProcessingPool();
allocateTemporaryPool();
}
void* allocateInferenceMemory(size_t size) {
// Allocate unified memory for inference operations
auto memory = JetsonUtils::allocateUnifiedMemory(size);
if (memory) {
inferenceAllocations_.push_back({memory.value(), size});
return memory.value();
}
return nullptr;
}
void processWithGPUAcceleration(const std::vector<float>& input,
std::vector<float>& output) {
// Create CUDA stream for async operations
auto stream = JetsonUtils::createCUDAStream();
if (!stream) {
LOGE << "Failed to create CUDA stream";
return;
}
// Allocate GPU memory
size_t inputSize = input.size() * sizeof(float);
size_t outputSize = output.size() * sizeof(float);
auto inputGPU = JetsonUtils::allocateGPUMemory(inputSize);
auto outputGPU = JetsonUtils::allocateGPUMemory(outputSize);
if (!inputGPU || !outputGPU) {
LOGE << "Failed to allocate GPU memory";
return;
}
// Copy data to GPU asynchronously
JetsonUtils::cudaMemcpyAsync(inputGPU.value(), input.data(), inputSize,
cudaMemcpyHostToDevice, stream.value());
// Measure kernel execution time
float kernelTime = JetsonUtils::measureCUDAKernelTime([&]() {
// Launch custom CUDA kernel
launchProcessingKernel(inputGPU.value(), outputGPU.value(),
input.size(), stream.value());
}).value_or(0.0f);
// Copy results back
JetsonUtils::cudaMemcpyAsync(output.data(), outputGPU.value(), outputSize,
cudaMemcpyDeviceToHost, stream.value());
// Synchronize and cleanup
JetsonUtils::cudaSynchronize();
JetsonUtils::freeGPUMemory(inputGPU.value());
JetsonUtils::freeGPUMemory(outputGPU.value());
JetsonUtils::destroyCUDAStream(stream.value());
LOGI << "GPU processing completed in " << kernelTime << "ms";
}
private:
struct MemoryAllocation {
void* ptr;
size_t size;
};
std::vector<MemoryAllocation> inferenceAllocations_;
std::vector<MemoryAllocation> imageProcessingAllocations_;
std::vector<MemoryAllocation> temporaryAllocations_;
void allocateInferenceMemoryPool() {
// Pre-allocate memory for inference operations
const size_t poolSize = 512 * 1024 * 1024; // 512 MB
auto pool = JetsonUtils::allocateUnifiedMemory(poolSize);
if (pool) {
inferenceMemoryPool_ = pool.value();
LOGI << "Allocated " << poolSize / 1024 / 1024 << "MB inference memory pool";
}
}
void* inferenceMemoryPool_ = nullptr;
};
Complete Jetson AI Pipeline Integration¶
-- Complete AI processing pipeline optimized for Jetson
local jetson = api.factory.jetsonutils.create(instance, "jetson_ai_pipeline")
local inference = api.factory.inference.create(instance, "jetson_inference")
local video = api.factory.videoreader.create(instance, "jetson_video")
-- Initialize Jetson optimizations
function initializeJetsonPipeline()
-- Detect platform and configure accordingly
local platform = jetson:detectPlatform()
local platformName = jetson:getPlatformName()
print("Initializing AI pipeline for: " .. platformName)
-- Set optimal power mode
local powerModes = jetson:getAvailablePowerModes()
jetson:setPowerMode("MAXN") -- Maximum performance
-- Configure memory management
jetson:optimizeMemoryUsage()
-- Configure inference engine for Jetson
local inferenceConfig = {
engine = "tensorrt",
device = "GPU",
precision = "FP16", -- Use FP16 for better performance on Jetson
max_batch_size = jetson:hasDLA() and 8 or 4,
dla_enabled = jetson:hasDLA(),
gpu_memory_fraction = 0.7
}
if jetson:hasDLA() then
-- Use DLA for compatible models
inferenceConfig.dla_core = 0
inferenceConfig.allow_gpu_fallback = true
print("DLA acceleration enabled")
end
inference:configure(inferenceConfig)
inference:loadModel("/models/detection_jetson_optimized.trt")
-- Configure video input with hardware acceleration
video:configure({
hardware_acceleration = true,
nvdec_enabled = true,
vic_enabled = jetson:hasVIC(),
memory_type = "unified" -- Use unified memory for efficiency
})
-- Start thermal monitoring
startThermalMonitoring()
print("Jetson AI pipeline initialized successfully")
end
-- Thermal monitoring and adaptive performance
function startThermalMonitoring()
-- Implement periodic monitoring in main processing loop
-- Use frame counters or os.time() for intervals
local function checkThermalStatus()
local thermal = jetson:getThermalInfo()
local perf = jetson:getPerformanceInfo()
-- Log thermal and performance status
-- Log thermal and performance status
print("Jetson Status - CPU: " .. thermal.cpu_temp .. "°C (" .. perf.cpu_usage .. "%), GPU: " .. thermal.gpu_temp .. "°C (" .. perf.gpu_usage .. "%), Memory: " .. perf.memory_usage .. "%")
-- Adaptive thermal management
if thermal.cpu_temp > 80 or thermal.gpu_temp > 80 then
print("High temperature detected, reducing performance")
jetson:setPowerMode("15W")
-- Reduce inference batch size
inference:setMaxBatchSize(2)
elseif thermal.cpu_temp < 70 and thermal.gpu_temp < 70 then
-- Temperature is safe, can increase performance
jetson:setPowerMode("MAXN")
inference:setMaxBatchSize(jetson:hasDLA() and 8 or 4)
end
-- Memory management
if perf.memory_usage > 85 then
print("High memory usage, clearing cache")
jetson:clearMemoryCache()
end
end)
end
-- Main AI processing function
function processWithJetsonOptimization(frame)
local startTime = os.clock()
-- Run inference with Jetson optimizations
local detections = inference:runInference(frame)
if detections then
local inferenceTime = os.clock() - startTime
-- Log performance metrics
local perf = jetson:getPerformanceInfo()
print("Inference completed in " .. (inferenceTime * 1000) .. "ms, GPU: " .. perf.gpu_usage .. "%, " .. #detections .. " detections")
-- Process results
return processDetections(detections)
end
return nil
end
-- Performance optimization based on platform
function optimizeForPlatform()
local platformName = jetson:getPlatformName()
if string.find(platformName, "Nano") then
-- Jetson Nano optimizations
print("Applying Jetson Nano optimizations")
jetson:setPowerMode("10W")
inference:setMaxBatchSize(1)
inference:setPrecision("FP16")
elseif string.find(platformName, "TX2") then
-- Jetson TX2 optimizations
print("Applying Jetson TX2 optimizations")
jetson:setPowerMode("MAXN")
inference:setMaxBatchSize(2)
elseif string.find(platformName, "Xavier") then
-- Jetson Xavier optimizations
print("Applying Jetson Xavier optimizations")
jetson:setPowerMode("MAXN")
inference:setMaxBatchSize(8)
inference:enableDLA(true)
elseif string.find(platformName, "Orin") then
-- Jetson Orin optimizations
print("Applying Jetson Orin optimizations")
jetson:setPowerMode("MAXN")
inference:setMaxBatchSize(16)
inference:enableDLA(true)
inference:setPrecision("INT8") -- Latest Orin supports INT8 well
end
end
-- Initialize and start the pipeline
initializeJetsonPipeline()
optimizeForPlatform()
Best Practices¶
Platform Optimization¶
- Power Mode Selection: Choose appropriate power modes based on performance requirements vs. thermal constraints
- Memory Management: Use unified memory when possible for automatic CPU-GPU memory management
- Thermal Awareness: Implement thermal monitoring and adaptive performance scaling
- Hardware Utilization: Leverage DLA and VIC hardware accelerators when available
Performance Guidelines¶
Memory Optimization¶
- Pre-allocate memory pools for frequently used operations
- Use unified memory for data shared between CPU and GPU
- Implement proper memory cleanup to prevent leaks
- Monitor memory usage and implement adaptive strategies
CUDA Best Practices¶
- Use asynchronous operations with CUDA streams
- Optimize kernel launch configurations for Jetson architecture
- Leverage Jetson-specific CUDA libraries when available
- Implement proper error checking for CUDA operations
Integration Strategies¶
- AI Inference: Integrate with TensorRT for optimal inference performance
- Video Processing: Use hardware-accelerated video decode/encode when available
- Computer Vision: Leverage VIC (Video Image Compositor) for image processing
- Deep Learning: Utilize DLA for supported neural network operations
Troubleshooting¶
Common Issues¶
CUDA Initialization Problems¶
// Check CUDA availability and version
auto cudaProps = JetsonUtils::getCUDADeviceProperties();
if (!cudaProps) {
LOGE << "CUDA device not available or not properly initialized";
// Check: JetPack installation, CUDA runtime, device permissions
return;
}
LOGI << "CUDA Device: " << cudaProps->name;
LOGI << "Compute Capability: " << cudaProps->major << "." << cudaProps->minor;
Memory Allocation Failures¶
- Insufficient GPU Memory: Reduce batch sizes or model complexity
- Memory Fragmentation: Implement memory pool management
- Unified Memory Issues: Check L4T version compatibility
Thermal Throttling¶
- Cooling Solutions: Ensure adequate cooling for sustained performance
- Power Mode Adjustment: Use lower power modes in thermal-constrained environments
- Workload Scheduling: Implement thermal-aware workload scheduling
Platform Detection Issues¶
// Verify platform detection
auto platform = JetsonUtils::detectPlatform();
if (platform == JetsonPlatform::Unknown) {
LOGE << "Failed to detect Jetson platform";
// Check: /proc/device-tree compatibility, L4T version
}
Debugging Tools¶
// Comprehensive Jetson diagnostics
void diagnoseJetsonSystem() {
// Platform information
LOGI << "Platform: " << JetsonUtils::getPlatformName();
LOGI << "JetPack: " << JetsonUtils::getJetPackVersion();
LOGI << "L4T: " << JetsonUtils::getL4TVersion();
// Hardware capabilities
LOGI << "DLA Support: " << (JetsonUtils::hasDLA() ? "Yes" : "No");
LOGI << "VIC Support: " << (JetsonUtils::hasVIC() ? "Yes" : "No");
// Memory status
auto memInfo = JetsonUtils::getMemoryInfo();
if (memInfo) {
LOGI << "Memory Usage: " << memInfo->utilization_percent << "%";
}
// Thermal status
auto thermal = JetsonUtils::getThermalInfo();
if (thermal) {
LOGI << "CPU Temperature: " << thermal->cpu_temp << "°C";
LOGI << "GPU Temperature: " << thermal->gpu_temp << "°C";
LOGI << "Thermal Throttling: " << (thermal->is_throttling ? "Active" : "Inactive");
}
// Performance status
auto perf = JetsonUtils::getPerformanceInfo();
if (perf) {
LOGI << "CPU Usage: " << perf->cpu_usage << "%";
LOGI << "GPU Usage: " << perf->gpu_usage << "%";
LOGI << "CPU Frequency: " << perf->cpu_frequency << "MHz";
LOGI << "GPU Frequency: " << perf->gpu_frequency << "MHz";
}
}
Integration Examples¶
High-Performance AI Pipeline¶
// Complete high-performance AI pipeline for Jetson
class JetsonAIPipeline {
public:
void initialize() {
// Initialize Jetson utilities
initializeJetsonOptimizations();
// Setup inference engines
setupTensorRTInference();
// Configure video processing
setupVideoProcessing();
// Start monitoring
startSystemMonitoring();
}
void processPipeline(const VideoFrame& frame) {
auto startTime = std::chrono::high_resolution_clock::now();
// Pre-process with VIC if available
auto preprocessed = preprocessWithVIC(frame);
// Run inference with DLA/GPU
auto detections = runInferenceOptimized(preprocessed);
// Post-process results
auto results = postprocessResults(detections);
// Measure and log performance
auto endTime = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime);
logPerformanceMetrics(duration.count(), results.size());
}
private:
void initializeJetsonOptimizations() {
// Detect platform and optimize accordingly
auto platform = JetsonUtils::detectPlatform();
optimizeForPlatform(platform);
// Set optimal power mode
JetsonUtils::setPowerMode(PowerMode::MAXN);
// Configure thermal management
JetsonUtils::setThermalThreshold(80.0f);
// Optimize memory usage
JetsonUtils::optimizeMemoryUsage();
}
};
See Also¶
- Ambarella Plugin - Ambarella SoC platform integration
- TensorRT Engine - NVIDIA TensorRT inference optimization
- Platform Plugins Overview - All platform-specific plugins
- Plugin Overview - Complete plugin ecosystem
- Inference Plugins - AI inference integration