Skip to content

Metrics API #1366

@LinasKo

Description

@LinasKo

This issue aggregates the discussion and near-future plans to introduce metrics to supervision.

The first steps shall be enacted by the core Roboflow team, and then we'll open submissions for specific metrics for the community.

I propose the following:

  • Aim for ease of usage, compact API, sacrificing completeness if required.
  • Provide public classes with aggregation by default (metrics.py), keep implementation in impl.py or equivalent, to be used internally.
  • Expose not in global scope, but in supervision.metrics.
  • I don't think we need to split into metrics.detection, metrics.segmentation, metrics.classification, but I'm on the fence.
  • Focus only on what we can apply to Detections object.
  • This means, only implement metrics if they use some of: class_id, confidence, xyxy, mask, xyxyxyxy (in Detections.data).

⚠️ I don't know:

  • How metrics are computed when targets and predictions have different numbers of detections or they are mismatched.
  • I don't think metrics should fail in that case, but perhaps there's a standard way of addressing this.

I believe we could start with:

  • Importing current metrics into the new system:
    • IoU
    • mAP
    • Confusion Matrix
  • Detections
    • Accuracy
    • Precision
    • Recall
  • General
    • Mean confidence
    • Median confidence
    • Min confidence
    • Max confidence
    • (not typical, but I'd find useful) - number of unique classes detected & aggregate count of how many objects of which class were detected - N defects / hour).

I believe the param Metrics needs to provide during construction is queue_size.

  • 1 - don't keep history, only ever give metrics of current batch
  • N - keep up to N metric results in history for computation.

Other thoughts:

  • I don't think metrics should know about datasets. Instead of benchmark as it is in current API, let's have def benchmark_dataset(dataset, metric) in metrics/utils.py.

API:

class Accuracy(Metric):
    def __init__(self, queue_size=1) -> None
    
    @override
    def update(predictions: Detections, targets: Detections) -> None
    
    @override
    def compute() -> NotSureYet

    # Metric also provides  `def detect_and_compute(args*, kwargs**)`.

accuracy_metric = Accuracy()
accuracy_metric.add(detections, detections_ground_truth)
accuracy = accuracy_metric.compute()

Related features:

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions