Metrics API

This issue aggregates the discussion and near-future plans to introduce metrics to supervision.

The first steps shall be enacted by the core Roboflow team, and then we'll open submissions for specific metrics for the community.

### I propose the following:
* Aim for ease of usage, compact API, sacrificing completeness if required.
* Provide public classes with aggregation by default (metrics.py), keep implementation in impl.py or equivalent, to be used internally.
* Expose not in global scope, but in supervision.metrics.
* I don't think we need to split into metrics.detection, metrics.segmentation, metrics.classification, but I'm on the fence. 
* Focus only on what we can apply to Detections object.
* This means, only implement metrics if they use some of: class_id, confidence, xyxy, mask, xyxyxyxy (in Detections.data).

### :warning:  I don't know:
* How metrics are computed when targets and predictions have different numbers of detections or they are mismatched.
* I don't think metrics should fail in that case, but perhaps there's a standard way of addressing this.

### I believe we could start with:
* Importing current metrics into the new system:
  * IoU
  * mAP
  * Confusion Matrix
* Detections
  * Accuracy
  * Precision
  * Recall
* General
  * Mean confidence
  * Median confidence
  * Min confidence
  * Max confidence
  * (not typical, but I'd find useful) - number of unique classes detected & aggregate count of how many objects of which class were detected - N defects / hour).

I believe the param Metrics needs to provide during construction is queue_size.
* 1 - don't keep history, only ever give metrics of current batch
* N - keep up to N metric results in history for computation.

### Other thoughts:
* I don't think metrics should know about datasets. Instead of benchmark as it is in current API, let's have def benchmark_dataset(dataset, metric) in metrics/utils.py.

### API:

```python
class Accuracy(Metric):
    def __init__(self, queue_size=1) -> None
    
    @override
    def update(predictions: Detections, targets: Detections) -> None
    
    @override
    def compute() -> NotSureYet

    # Metric also provides  `def detect_and_compute(args*, kwargs**)`.

accuracy_metric = Accuracy()
accuracy_metric.add(detections, detections_ground_truth)
accuracy = accuracy_metric.compute()
```

Related features:
* https://github.com/roboflow/supervision/issues/140
* https://github.com/roboflow/supervision/pull/177
* https://github.com/roboflow/supervision/issues/232
* https://github.com/roboflow/supervision/pull/236
* https://github.com/roboflow/supervision/issues/292
* https://github.com/roboflow/supervision/issues/480
* https://github.com/roboflow/supervision/issues/632


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics API #1366

I propose the following:

⚠️ I don't know:

I believe we could start with:

Other thoughts:

API:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metrics API #1366

Description

I propose the following:

⚠️ I don't know:

I believe we could start with:

Other thoughts:

API:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions