COCOEval Plugin¶
Description¶
COCOEval is a model evaluation plugin for CVEDIA-RT that implements COCO (Common Objects in Context) dataset evaluation metrics. It provides comprehensive performance assessment for object detection models using standard COCO evaluation protocols and benchmarking methodologies.
This plugin enables rigorous evaluation of object detection models against ground truth data using industry-standard COCO metrics including Average Precision (AP), Average Recall (AR), precision-recall curves, and size-based performance analysis. It supports both single-image evaluation and batch processing for comprehensive model validation and benchmarking.
Key Features¶
- COCO-Compliant Evaluation: Full implementation of official COCO evaluation protocols and metrics
- Average Precision Calculation: Comprehensive AP metrics at multiple IoU thresholds (0.5:0.95)
- Size-Based Analysis: Performance evaluation across small, medium, and large object categories
- Multi-Class Support: Evaluation across multiple object classes with per-class metrics
- Precision-Recall Curves: Generation of detailed precision-recall curves for analysis
- F1 Score Calculation: Combined precision and recall metrics for balanced assessment
- Batch Evaluation: Efficient processing of large evaluation datasets
- Statistical Reporting: Comprehensive statistical summaries and performance reports
- IoU Matching: Configurable Intersection over Union thresholds for detection matching
- Ground Truth Validation: Robust validation against annotated ground truth data
Requirements¶
Hardware Requirements¶
- CPU: Multi-core processor for efficient batch processing
- Memory: Minimum 4GB RAM (8GB+ recommended for large datasets)
- Storage: Sufficient storage for evaluation datasets and result caching
Software Dependencies¶
- RTCORE: CVEDIA-RT core library for plugin infrastructure
- Mathematical Libraries: Statistical computation and linear algebra libraries
- JSON Library: JSON parsing for COCO format annotations
- Image Processing: Basic image processing capabilities for bounding box operations
- Threading Library: Multi-threading support for parallel evaluation
Data Requirements¶
- COCO Format Annotations: Ground truth annotations in COCO JSON format
- Detection Results: Model predictions in compatible format
- Image Metadata: Image dimensions and reference information
- Category Definitions: Object class definitions and category mappings
Configuration¶
Basic Configuration¶
{
"cocoeval": {
"iou_thresholds": [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95],
"area_ranges": {
"small": [0, 32],
"medium": [32, 96],
"large": [96, 10000000000]
},
"max_detections": [1, 10, 100],
"confidence_threshold": 0.1
}
}
Advanced Configuration¶
{
"cocoeval": {
"iou_thresholds": [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95],
"area_ranges": {
"small": [0, 32],
"medium": [32, 96],
"large": [96, 10000000000],
"extra_large": [256, 10000000000]
},
"max_detections": [1, 10, 100, 300],
"confidence_threshold": 0.05,
"evaluation_settings": {
"use_segm": false,
"use_crowd": true,
"area_rng_lbl": ["all", "small", "medium", "large"],
"max_dets_lbl": ["all", "small", "medium", "large"]
},
"output_settings": {
"save_results": true,
"result_format": "json",
"include_curves": true,
"detailed_stats": true
},
"performance": {
"parallel_processing": true,
"batch_size": 100,
"cache_results": true
}
}
}
Configuration Schema¶
Parameter | Type | Default | Description |
---|---|---|---|
iou_thresholds |
array | [0.5:0.95] | IoU thresholds for evaluation |
area_ranges |
object | COCO standard | Object size ranges (pixels²) |
max_detections |
array | [1, 10, 100] | Maximum detections per image |
confidence_threshold |
float | 0.1 | Minimum confidence for evaluation |
use_segm |
bool | false | Use segmentation masks |
use_crowd |
bool | true | Include crowd annotations |
save_results |
bool | true | Save evaluation results |
result_format |
string | "json" | Output format ("json", "csv") |
include_curves |
bool | true | Include precision-recall curves |
detailed_stats |
bool | true | Generate detailed statistics |
parallel_processing |
bool | true | Enable parallel evaluation |
batch_size |
int | 100 | Evaluation batch size |
cache_results |
bool | true | Cache intermediate results |
API Reference¶
C++ API (CocoEvalImpl)¶
Core Evaluation Methods¶
class CocoEvalImpl : public CocoEval {
public:
// Main evaluation functions
expected<CocoSummary> evaluate(const cvec& detections,
const cvec& groundtruth,
const std::vector<std::string>& categories,
int refWidth, int refHeight) override;
expected<CocoSummary> evaluateImages(const std::vector<ImageEvaluation>& imageEvals) override;
// Statistical analysis
expected<void> accumulate() override;
expected<std::string> summaryToString(const CocoSummary& summary) override;
// Configuration and setup
expected<void> setCategories(const std::vector<std::string>& categories) override;
expected<void> setEvaluationParams(const EvaluationParams& params) override;
expected<EvaluationParams> getEvaluationParams() override;
};
Data Structures¶
struct CocoSummary {
std::vector<APAR> apar; // Average Precision/Average Recall results
float f1Score = 0.0f; // F1 score
float ap = 0.0f; // Average Precision (AP @0.5:0.95)
float ap50 = 0.0f; // AP @ IoU=0.5
float ap75 = 0.0f; // AP @ IoU=0.75
float apSmall = 0.0f; // AP for small objects
float apMedium = 0.0f; // AP for medium objects
float apLarge = 0.0f; // AP for large objects
float ar = 0.0f; // Average Recall (AR)
float arSmall = 0.0f; // AR for small objects
float arMedium = 0.0f; // AR for medium objects
float arLarge = 0.0f; // AR for large objects
// Per-class metrics
std::map<std::string, float> classAP; // Per-class Average Precision
std::map<std::string, float> classAR; // Per-class Average Recall
};
struct APAR {
float ap = 0.0f; // Average Precision
float ar = 0.0f; // Average Recall
float f1 = 0.0f; // F1 Score
std::string category; // Object category
std::string areaRange; // Size range ("small", "medium", "large")
float iouThreshold = 0.5f; // IoU threshold
};
struct EvaluationParams {
std::vector<float> iouThresholds = {0.5f, 0.55f, 0.6f, 0.65f, 0.7f, 0.75f,
0.8f, 0.85f, 0.9f, 0.95f};
std::vector<int> areaRanges = {0, 32, 96, 10000000000};
std::vector<int> maxDetections = {1, 10, 100};
float confidenceThreshold = 0.1f;
bool useCrowd = true;
};
Detection and Ground Truth Structures¶
struct Detection {
cv::Rect2f bbox; // Bounding box (normalized or pixel coordinates)
float confidence = 0.0f; // Detection confidence
std::string category; // Object category/class
int imageId = -1; // Image identifier
std::vector<cv::Point2f> segmentation; // Segmentation points (optional)
};
struct GroundTruth {
cv::Rect2f bbox; // Ground truth bounding box
std::string category; // Object category/class
int imageId = -1; // Image identifier
bool isCrowd = false; // Crowd annotation flag
float area = 0.0f; // Object area
std::vector<cv::Point2f> segmentation; // Ground truth segmentation
};
Lua API¶
COCO Evaluation Setup¶
-- Create COCO evaluation instance
local cocoEval = api.factory.cocoeval.create(instance, "model_evaluator")
-- Configure COCO evaluation parameters
cocoEval:configure({
iou_thresholds = {0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95},
area_ranges = {
small = {0, 32},
medium = {32, 96},
large = {96, 10000000000}
},
max_detections = {1, 10, 100},
confidence_threshold = 0.1,
save_results = true,
detailed_stats = true
})
-- Set object categories
cocoEval:setCategories({"person", "car", "bicycle", "dog", "cat"})
Model Evaluation Processing¶
-- Evaluate model performance against ground truth
function evaluateModelPerformance(detections, groundTruth, imageWidth, imageHeight)
-- Perform COCO evaluation
local results = cocoEval:evaluate(detections, groundTruth,
{"person", "car", "bicycle"},
imageWidth, imageHeight)
if results then
api.logging.LogInfo("COCO Evaluation Results:")
api.logging.LogInfo(" Average Precision (AP): " .. results.ap)
api.logging.LogInfo(" AP @ IoU=0.5: " .. results.ap50)
api.logging.LogInfo(" AP @ IoU=0.75: " .. results.ap75)
api.logging.LogInfo(" AP (small objects): " .. results.apSmall)
api.logging.LogInfo(" AP (medium objects): " .. results.apMedium)
api.logging.LogInfo(" AP (large objects): " .. results.apLarge)
api.logging.LogInfo(" Average Recall (AR): " .. results.ar)
api.logging.LogInfo(" F1 Score: " .. results.f1Score)
-- Print per-class results
if results.classAP then
api.logging.LogInfo("\nPer-Class Average Precision:")
for class, ap in pairs(results.classAP) do
api.logging.LogInfo(string.format(" %s: %.3f", class, ap))
end
end
return results
else
api.logging.LogError("Evaluation failed")
return nil
end
end
-- Process detection results for evaluation
function processDetectionResults(modelOutput)
local detections = {}
for _, detection in ipairs(modelOutput) do
table.insert(detections, {
bbox = {
x = detection.x,
y = detection.y,
w = detection.w,
h = detection.h
},
confidence = detection.confidence,
category = detection.class,
image_id = detection.image_id
})
end
return detections
end
Batch Evaluation¶
-- Batch evaluation for dataset validation
function performBatchEvaluation(datasetPath, modelResultsPath)
api.logging.LogInfo("Starting batch COCO evaluation...")
-- Load ground truth annotations
local groundTruth = loadCocoAnnotations(datasetPath .. "/annotations.json")
-- Load model detection results
local detections = loadDetectionResults(modelResultsPath)
-- Perform batch evaluation
local batchResults = {}
local totalImages = #groundTruth
for i = 1, totalImages do
local imageGT = groundTruth[i]
local imageDets = filterDetectionsByImage(detections, imageGT.image_id)
-- Evaluate single image
local imageResult = cocoEval:evaluate(imageDets, {imageGT},
cocoEval:getCategories(),
imageGT.width, imageGT.height)
if imageResult then
table.insert(batchResults, imageResult)
end
-- Progress reporting
if i % 100 == 0 then
api.logging.LogInfo(string.format("Processed %d/%d images", i, totalImages))
end
end
-- Accumulate results
cocoEval:accumulate()
-- Generate final summary
local finalSummary = calculateBatchSummary(batchResults)
api.logging.LogInfo("Batch evaluation completed:")
api.logging.LogInfo(" Dataset size: " .. totalImages)
api.logging.LogInfo(" Overall AP: " .. finalSummary.ap)
api.logging.LogInfo(" Overall AR: " .. finalSummary.ar)
return finalSummary
end
-- Calculate summary statistics from batch results
function calculateBatchSummary(batchResults)
local summary = {
ap = 0,
ap50 = 0,
ap75 = 0,
ar = 0,
f1Score = 0,
totalImages = #batchResults
}
for _, result in ipairs(batchResults) do
summary.ap = summary.ap + result.ap
summary.ap50 = summary.ap50 + result.ap50
summary.ap75 = summary.ap75 + result.ap75
summary.ar = summary.ar + result.ar
summary.f1Score = summary.f1Score + result.f1Score
end
-- Calculate averages
if summary.totalImages > 0 then
summary.ap = summary.ap / summary.totalImages
summary.ap50 = summary.ap50 / summary.totalImages
summary.ap75 = summary.ap75 / summary.totalImages
summary.ar = summary.ar / summary.totalImages
summary.f1Score = summary.f1Score / summary.totalImages
end
return summary
end
Examples¶
Basic Model Evaluation¶
#include "cocoeval.h"
#include "rtcore.h"
// Basic COCO evaluation implementation
class ModelEvaluator {
public:
void initialize() {
// Create COCO evaluator
cocoEval_ = std::unique_ptr<CocoEval>(
static_cast<CocoEval*>(
CocoEval::create("model_evaluator").release()
)
);
// Set evaluation categories
std::vector<std::string> categories = {
"person", "bicycle", "car", "motorcycle", "airplane",
"bus", "train", "truck", "boat", "traffic light"
};
cocoEval_->setCategories(categories);
// Configure evaluation parameters
EvaluationParams params;
params.iouThresholds = {0.5f, 0.55f, 0.6f, 0.65f, 0.7f,
0.75f, 0.8f, 0.85f, 0.9f, 0.95f};
params.confidenceThreshold = 0.1f;
params.useCrowd = true;
cocoEval_->setEvaluationParams(params);
LOGI << "COCO evaluator initialized with " << categories.size() << " categories";
}
CocoSummary evaluateModel(const std::vector<Detection>& detections,
const std::vector<GroundTruth>& groundTruth,
int imageWidth, int imageHeight) {
// Convert data to plugin format
cvec detectionsVec = convertDetections(detections);
cvec groundTruthVec = convertGroundTruth(groundTruth);
// Get category names
auto categories = getCategoryNames();
// Perform evaluation
auto result = cocoEval_->evaluate(detectionsVec, groundTruthVec,
categories, imageWidth, imageHeight);
if (!result) {
LOGE << "COCO evaluation failed: " << result.error().message();
return CocoSummary{};
}
auto summary = result.value();
// Log results
LOGI << "COCO Evaluation Results:";
LOGI << " Average Precision (AP): " << summary.ap;
LOGI << " AP @ IoU=0.5: " << summary.ap50;
LOGI << " AP @ IoU=0.75: " << summary.ap75;
LOGI << " AP (small): " << summary.apSmall;
LOGI << " AP (medium): " << summary.apMedium;
LOGI << " AP (large): " << summary.apLarge;
LOGI << " Average Recall (AR): " << summary.ar;
LOGI << " F1 Score: " << summary.f1Score;
return summary;
}
private:
std::unique_ptr<CocoEval> cocoEval_;
cvec convertDetections(const std::vector<Detection>& detections) {
cvec result;
for (const auto& det : detections) {
auto detection = CValue::create();
// Bounding box
auto bbox = CValue::create();
bbox->set("x", det.bbox.x);
bbox->set("y", det.bbox.y);
bbox->set("w", det.bbox.width);
bbox->set("h", det.bbox.height);
detection->set("bbox", bbox);
// Detection properties
detection->set("confidence", det.confidence);
detection->set("category", det.category);
detection->set("image_id", det.imageId);
result.push_back(detection);
}
return result;
}
cvec convertGroundTruth(const std::vector<GroundTruth>& groundTruth) {
cvec result;
for (const auto& gt : groundTruth) {
auto truth = CValue::create();
// Bounding box
auto bbox = CValue::create();
bbox->set("x", gt.bbox.x);
bbox->set("y", gt.bbox.y);
bbox->set("w", gt.bbox.width);
bbox->set("h", gt.bbox.height);
truth->set("bbox", bbox);
// Ground truth properties
truth->set("category", gt.category);
truth->set("image_id", gt.imageId);
truth->set("is_crowd", gt.isCrowd);
truth->set("area", gt.area);
result.push_back(truth);
}
return result;
}
std::vector<std::string> getCategoryNames() {
return {"person", "bicycle", "car", "motorcycle", "airplane",
"bus", "train", "truck", "boat", "traffic light"};
}
};
Advanced Dataset Evaluation¶
// Advanced dataset evaluation with detailed analysis
class DatasetEvaluator {
public:
void evaluateDataset(const std::string& datasetPath,
const std::string& resultsPath) {
// Initialize evaluation
initialize();
// Load dataset
auto dataset = loadCocoDataset(datasetPath);
auto detections = loadDetectionResults(resultsPath);
// Perform comprehensive evaluation
auto overallResults = performComprehensiveEvaluation(dataset, detections);
// Generate detailed report
generateEvaluationReport(overallResults);
// Save results
saveEvaluationResults(overallResults);
}
private:
std::unique_ptr<CocoEval> cocoEval_;
struct DatasetResults {
CocoSummary overallSummary;
std::map<std::string, CocoSummary> perClassResults;
std::map<std::string, CocoSummary> perSizeResults;
std::vector<ImageEvaluationResult> imageResults;
EvaluationStatistics statistics;
};
struct EvaluationStatistics {
int totalImages = 0;
int totalDetections = 0;
int totalGroundTruth = 0;
int truePositives = 0;
int falsePositives = 0;
int falseNegatives = 0;
float averageConfidence = 0.0f;
std::map<std::string, int> categoryDistribution;
};
DatasetResults performComprehensiveEvaluation(const CocoDataset& dataset,
const std::vector<Detection>& detections) {
DatasetResults results;
// Overall evaluation
results.overallSummary = evaluateOverall(dataset, detections);
// Per-class evaluation
results.perClassResults = evaluatePerClass(dataset, detections);
// Per-size evaluation
results.perSizeResults = evaluatePerSize(dataset, detections);
// Per-image evaluation
results.imageResults = evaluatePerImage(dataset, detections);
// Calculate statistics
results.statistics = calculateStatistics(dataset, detections);
return results;
}
void generateEvaluationReport(const DatasetResults& results) {
LOGI << "=== COCO Dataset Evaluation Report ===";
LOGI << "";
LOGI << "Overall Performance:";
LOGI << " Average Precision (AP): " << results.overallSummary.ap;
LOGI << " Average Recall (AR): " << results.overallSummary.ar;
LOGI << " F1 Score: " << results.overallSummary.f1Score;
LOGI << "";
LOGI << "Size-based Performance:";
LOGI << " Small objects AP: " << results.overallSummary.apSmall;
LOGI << " Medium objects AP: " << results.overallSummary.apMedium;
LOGI << " Large objects AP: " << results.overallSummary.apLarge;
LOGI << "";
LOGI << "Per-Class Performance:";
for (const auto& [className, summary] : results.perClassResults) {
LOGI << " " << className << " AP: " << summary.ap;
}
LOGI << "";
LOGI << "Dataset Statistics:";
LOGI << " Total Images: " << results.statistics.totalImages;
LOGI << " Total Detections: " << results.statistics.totalDetections;
LOGI << " Total Ground Truth: " << results.statistics.totalGroundTruth;
LOGI << " True Positives: " << results.statistics.truePositives;
LOGI << " False Positives: " << results.statistics.falsePositives;
LOGI << " False Negatives: " << results.statistics.falseNegatives;
// Calculate precision and recall
float precision = static_cast<float>(results.statistics.truePositives) /
(results.statistics.truePositives + results.statistics.falsePositives);
float recall = static_cast<float>(results.statistics.truePositives) /
(results.statistics.truePositives + results.statistics.falseNegatives);
LOGI << " Precision: " << precision;
LOGI << " Recall: " << recall;
LOGI << "================================";
}
};
Complete Model Benchmarking System¶
-- Complete model benchmarking system with COCO evaluation
local cocoEval = api.factory.cocoeval.create(instance, "benchmark_evaluator")
local modelManager = api.factory.inference.create(instance, "benchmark_models")
-- Benchmarking configuration
local benchmarkConfig = {
models_to_test = {
{name = "yolov5s", path = "/models/yolov5s.onnx", type = "detection"},
{name = "yolov5m", path = "/models/yolov5m.onnx", type = "detection"},
{name = "efficientdet", path = "/models/efficientdet.trt", type = "detection"}
},
datasets = {
{name = "coco_val", path = "/datasets/coco/val2017", annotations = "/datasets/coco/annotations/instances_val2017.json"},
{name = "custom_test", path = "/datasets/custom/test", annotations = "/datasets/custom/annotations.json"}
},
evaluation_settings = {
iou_thresholds = {0.5, 0.75, 0.9},
confidence_thresholds = {0.1, 0.3, 0.5},
max_detections = {100, 300, 1000}
}
}
-- Model benchmarking results
local benchmarkResults = {
model_results = {},
comparative_analysis = {},
performance_metrics = {}
}
-- Initialize comprehensive benchmarking system
function initializeBenchmarkingSystem()
api.logging.LogInfo("Initializing COCO model benchmarking system")
-- Configure COCO evaluation
cocoEval:configure({
iou_thresholds = benchmarkConfig.evaluation_settings.iou_thresholds,
area_ranges = {
small = {0, 32},
medium = {32, 96},
large = {96, 10000000000}
},
max_detections = benchmarkConfig.evaluation_settings.max_detections,
confidence_threshold = 0.05, -- Low threshold for comprehensive analysis
save_results = true,
detailed_stats = true,
include_curves = true
})
-- Set COCO categories (80 classes)
local cocoCategories = {
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck",
"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
"bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra",
"giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee"
-- ... full COCO category list
}
cocoEval:setCategories(cocoCategories)
api.logging.LogInfo("Benchmarking system initialized with " .. #benchmarkConfig.models_to_test .. " models")
return true
end
-- Run comprehensive model benchmarking
function runModelBenchmarking()
api.logging.LogInfo("Starting comprehensive model benchmarking")
for _, dataset in ipairs(benchmarkConfig.datasets) do
api.logging.LogInfo("Evaluating on dataset: " .. dataset.name)
-- Load ground truth for dataset
local groundTruth = loadCocoGroundTruth(dataset.annotations)
for _, model in ipairs(benchmarkConfig.models_to_test) do
api.logging.LogInfo("Benchmarking model: " .. model.name)
-- Initialize model
local modelResults = benchmarkModel(model, dataset, groundTruth)
-- Store results
benchmarkResults.model_results[model.name] = benchmarkResults.model_results[model.name] or {}
benchmarkResults.model_results[model.name][dataset.name] = modelResults
end
end
-- Perform comparative analysis
performComparativeAnalysis()
-- Generate benchmark report
generateBenchmarkReport()
end
-- Benchmark individual model
function benchmarkModel(model, dataset, groundTruth)
local modelResults = {
model_name = model.name,
dataset_name = dataset.name,
evaluation_results = {},
performance_metrics = {},
detailed_analysis = {}
}
-- Load and configure model
modelManager:loadModel(model.path)
-- Test different confidence thresholds
for _, confidenceThreshold in ipairs(benchmarkConfig.evaluation_settings.confidence_thresholds) do
api.logging.LogInfo(string.format("Testing %s with confidence threshold %.2f",
model.name, confidenceThreshold))
-- Run inference on dataset
local detections = runInferenceOnDataset(model, dataset, confidenceThreshold)
-- Evaluate with COCO metrics
local cocoResults = cocoEval:evaluate(detections, groundTruth,
cocoEval:getCategories(),
dataset.image_width or 640,
dataset.image_height or 480)
if cocoResults then
modelResults.evaluation_results[tostring(confidenceThreshold)] = cocoResults
-- Log intermediate results
api.logging.LogInfo(string.format(
"Results for %s @ %.2f: AP=%.3f, AR=%.3f, F1=%.3f",
model.name, confidenceThreshold,
cocoResults.ap, cocoResults.ar, cocoResults.f1Score
))
end
end
-- Calculate performance metrics
modelResults.performance_metrics = calculateModelPerformanceMetrics(modelResults.evaluation_results)
-- Detailed analysis per category and size
modelResults.detailed_analysis = performDetailedModelAnalysis(model, dataset, groundTruth)
return modelResults
end
-- Perform comparative analysis across models
function performComparativeAnalysis()
api.logging.LogInfo("Performing comparative analysis across models")
local analysis = {
best_overall_ap = {model = "", dataset = "", ap = 0},
best_small_objects = {model = "", dataset = "", ap = 0},
best_large_objects = {model = "", dataset = "", ap = 0},
fastest_inference = {model = "", fps = 0},
model_rankings = {}
}
-- Compare models across datasets
for modelName, modelData in pairs(benchmarkResults.model_results) do
for datasetName, datasetResults in pairs(modelData) do
-- Find best confidence threshold for each model
local bestResult = findBestConfidenceThreshold(datasetResults.evaluation_results)
if bestResult then
-- Track best overall performance
if bestResult.ap > analysis.best_overall_ap.ap then
analysis.best_overall_ap = {
model = modelName,
dataset = datasetName,
ap = bestResult.ap
}
end
-- Track best small object performance
if bestResult.apSmall > analysis.best_small_objects.ap then
analysis.best_small_objects = {
model = modelName,
dataset = datasetName,
ap = bestResult.apSmall
}
end
-- Track best large object performance
if bestResult.apLarge > analysis.best_large_objects.ap then
analysis.best_large_objects = {
model = modelName,
dataset = datasetName,
ap = bestResult.apLarge
}
end
-- Add to rankings
table.insert(analysis.model_rankings, {
model = modelName,
dataset = datasetName,
ap = bestResult.ap,
ar = bestResult.ar,
f1 = bestResult.f1Score,
performance = datasetResults.performance_metrics
})
end
end
end
-- Sort rankings by AP
table.sort(analysis.model_rankings, function(a, b)
return a.ap > b.ap
end)
benchmarkResults.comparative_analysis = analysis
end
-- Generate comprehensive benchmark report
function generateBenchmarkReport()
api.logging.LogInfo("=== COCO Model Benchmarking Report ===")
api.logging.LogInfo("")
-- Overall best performers
local analysis = benchmarkResults.comparative_analysis
api.logging.LogInfo("Best Overall Performance:")
api.logging.LogInfo(string.format(" Model: %s on %s",
analysis.best_overall_ap.model,
analysis.best_overall_ap.dataset))
api.logging.LogInfo(string.format(" Average Precision: %.3f", analysis.best_overall_ap.ap))
api.logging.LogInfo("")
api.logging.LogInfo("Best Small Object Detection:")
api.logging.LogInfo(string.format(" Model: %s on %s",
analysis.best_small_objects.model,
analysis.best_small_objects.dataset))
api.logging.LogInfo(string.format(" Small Objects AP: %.3f", analysis.best_small_objects.ap))
api.logging.LogInfo("")
api.logging.LogInfo("Model Rankings (by Average Precision):")
for i, ranking in ipairs(analysis.model_rankings) do
if i <= 10 then -- Top 10
api.logging.LogInfo(string.format("%d. %s (%s): AP=%.3f, AR=%.3f, F1=%.3f",
i, ranking.model, ranking.dataset,
ranking.ap, ranking.ar, ranking.f1))
end
end
api.logging.LogInfo("")
-- Detailed per-model analysis
api.logging.LogInfo("Detailed Model Analysis:")
for modelName, modelData in pairs(benchmarkResults.model_results) do
api.logging.LogInfo(string.format("Model: %s", modelName))
for datasetName, results in pairs(modelData) do
local bestResult = findBestConfidenceThreshold(results.evaluation_results)
if bestResult then
api.logging.LogInfo(string.format(" Dataset %s: AP=%.3f, AP50=%.3f, AP75=%.3f",
datasetName, bestResult.ap,
bestResult.ap50, bestResult.ap75))
api.logging.LogInfo(string.format(" Small/Medium/Large: %.3f/%.3f/%.3f",
bestResult.apSmall, bestResult.apMedium,
bestResult.apLarge))
end
end
api.logging.LogInfo("")
end
api.logging.LogInfo("=== End of Benchmark Report ===")
-- Save detailed results to file
saveBenchmarkResults()
end
-- Save benchmark results to JSON file
function saveBenchmarkResults()
local resultsFile = "/tmp/coco_benchmark_results.json"
local jsonResults = api.json.encode(benchmarkResults)
local file = io.open(resultsFile, "w")
if file then
file:write(jsonResults)
file:close()
api.logging.LogInfo("Benchmark results saved to: " .. resultsFile)
else
api.logging.LogError("Failed to save benchmark results")
end
end
-- Initialize and run benchmarking
initializeBenchmarkingSystem()
runModelBenchmarking()
api.logging.LogInfo("COCO model benchmarking completed")
Best Practices¶
Evaluation Accuracy¶
- Ground Truth Quality: Ensure high-quality, consistent ground truth annotations
- IoU Threshold Selection: Use appropriate IoU thresholds based on application requirements
- Category Consistency: Maintain consistent category definitions across datasets
- Annotation Validation: Validate annotations for completeness and accuracy
Performance Optimization¶
- Batch Processing: Use batch evaluation for large datasets to improve efficiency
- Parallel Processing: Enable parallel processing for multi-core systems
- Result Caching: Cache intermediate results to avoid redundant calculations
- Memory Management: Optimize memory usage for large-scale evaluations
Statistical Rigor¶
- Multiple Metrics: Use multiple evaluation metrics for comprehensive assessment
- Cross-Validation: Perform cross-validation across different datasets
- Statistical Significance: Ensure statistical significance in comparisons
- Confidence Intervals: Report confidence intervals for performance metrics
Integration Guidelines¶
- Model Comparison: Use consistent evaluation protocols for fair model comparison
- Threshold Optimization: Systematically optimize confidence thresholds
- Error Analysis: Perform detailed error analysis to identify model weaknesses
- Reporting Standards: Follow standard reporting practices for reproducibility
Troubleshooting¶
Common Issues¶
Annotation Format Problems¶
// Validate COCO annotation format
bool validateCocoAnnotations(const std::string& annotationFile) {
// Check required fields
auto annotations = loadJsonFile(annotationFile);
if (!annotations.contains("images") ||
!annotations.contains("annotations") ||
!annotations.contains("categories")) {
LOGE << "Missing required COCO annotation fields";
return false;
}
// Validate bounding box formats
for (const auto& ann : annotations["annotations"]) {
if (!ann.contains("bbox") || ann["bbox"].size() != 4) {
LOGE << "Invalid bounding box format in annotation ID: " << ann["id"];
return false;
}
}
return true;
}
Evaluation Discrepancies¶
- Coordinate Systems: Ensure consistent coordinate systems (pixel vs normalized)
- Image Scaling: Account for image scaling in detection coordinates
- Category Mapping: Verify correct category ID mappings
- IoU Calculation: Validate IoU calculation accuracy
Performance Issues¶
- Large Datasets: Optimize memory usage for large evaluation datasets
- Batch Size: Adjust batch size based on available memory
- Parallel Processing: Enable parallel processing for faster evaluation
- Result Storage: Optimize result storage for large-scale evaluations
Metric Interpretation¶
- AP vs AP50: Understand differences between different AP metrics
- Size Categories: Correctly interpret small/medium/large object categories
- Confidence Thresholds: Select appropriate confidence thresholds for evaluation
- Statistical Significance: Ensure sufficient sample sizes for reliable metrics
Debugging Tools¶
// COCO evaluation diagnostics
void diagnoseCocoEvaluation(CocoEval* cocoEval,
const cvec& detections,
const cvec& groundTruth) {
// Validate input data
LOGI << "Detection count: " << detections.size();
LOGI << "Ground truth count: " << groundTruth.size();
// Check category distribution
std::map<std::string, int> detectionCounts;
std::map<std::string, int> groundTruthCounts;
for (const auto& det : detections) {
detectionCounts[det->get("category").getString()]++;
}
for (const auto& gt : groundTruth) {
groundTruthCounts[gt->get("category").getString()]++;
}
LOGI << "Category Distribution:";
for (const auto& [category, count] : groundTruthCounts) {
int detCount = detectionCounts[category];
LOGI << " " << category << ": GT=" << count << ", Det=" << detCount;
}
// Validate bounding boxes
validateBoundingBoxes(detections, "detections");
validateBoundingBoxes(groundTruth, "ground truth");
}
Integration Examples¶
Model Training Pipeline Integration¶
// Complete model training pipeline with COCO evaluation
class TrainingPipelineWithEvaluation {
public:
void trainWithEvaluation(const TrainingConfig& config) {
// Initialize training
initializeTraining(config);
// Training loop with periodic evaluation
for (int epoch = 0; epoch < config.maxEpochs; ++epoch) {
// Training step
performTrainingEpoch(epoch);
// Periodic evaluation
if (epoch % config.evaluationInterval == 0) {
auto evalResults = evaluateModel(epoch);
// Log evaluation results
logEvaluationResults(epoch, evalResults);
// Model selection based on evaluation
if (evalResults.ap > bestAP_) {
saveModelCheckpoint(epoch, evalResults);
bestAP_ = evalResults.ap;
}
}
}
// Final evaluation
performFinalEvaluation();
}
private:
std::unique_ptr<CocoEval> cocoEval_;
float bestAP_ = 0.0f;
CocoSummary evaluateModel(int epoch) {
// Load validation dataset
auto validationData = loadValidationDataset();
// Run inference on validation set
auto detections = runValidationInference(validationData);
// Perform COCO evaluation
auto results = cocoEval_->evaluate(detections.detections,
validationData.groundTruth,
validationData.categories,
validationData.imageWidth,
validationData.imageHeight);
return results.value_or(CocoSummary{});
}
};
See Also¶
- ClassifierEval Plugin - Classification model evaluation
- Processing Plugins Overview - All processing plugins
- Plugin Overview - Complete plugin ecosystem
- Inference Plugins - AI model integration
- Model Evaluation Guide - Model evaluation best practices