Add ConfusionMatrix to EvaluationAPI #177

kirilllzaitsev · 2023-07-04T19:50:59Z

Description

Please include a summary of the change and which issue is fixed or implemented. Please also include relevant motivation and context (e.g. links, docs, tickets etc.).

List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

See test/detection/test_*.py

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

github-actions

Hello there, thank you for opening an PR ! 🙏🏻 The team was notified and they will get back to you asap.

SkalskiP · 2023-07-04T20:55:27Z

Hi, @kirilllzaitsev 👋🏻! Are you ready for an initial review and tests?

kirilllzaitsev · 2023-07-05T12:06:11Z

Hi, @kirilllzaitsev 👋🏻! Are you ready for an initial review and tests?

@SkalskiP , hi. I still need to add the benchmark() method and do some cosmetics to the plot(). There are no tests in test_detection.py yet. Working bit by bit.

kirilllzaitsev · 2023-07-06T20:26:00Z

@SkalskiP, here are some images coming from the plot() function. Please let me know if there is something important to refine.

SkalskiP · 2023-07-20T13:52:29Z

@kirilllzaitsev Could you share the input that you used? I just pasted both implementations into Code Interpreter and got almost the same results for both functions.https://chat.openai.com/share/acd17c82-87f6-499a-9d09-0b675b4ef4ea

If you could share the input numpy array used to create those results, that would be super helpful. We could start to debug.

kirilllzaitsev · 2023-07-20T14:12:29Z

@kirilllzaitsev Could you share the input that you used? I just pasted both implementations into Code Interpreter and got almost the same results for both functions.https://chat.openai.com/share/acd17c82-87f6-499a-9d09-0b675b4ef4ea

If you could share the input numpy array used to create those results, that would be super helpful. We could start to debug.

detections_boxes = np.array([[     1638.8,      557.36,      1664.2,       634.8],
       [     1005.9,      641.22,      1042.6,       727.3],
       [     162.18,      495.79,      204.39,      571.12],
       [     1158.9,      484.44,        1184,       551.7],
       [     1706.4,       483.9,      1734.4,      557.86],
       [     1416.2,      357.97,        1436,      417.01],
       [     235.93,         369,      265.36,      431.33],
       [     1600.1,      317.87,      1617.4,      373.95],
       [     1249.6,      399.89,      1274.4,      459.96],
       [     414.32,      332.96,       436.2,      383.91],
       [      523.4,      469.03,      555.29,      542.89],
       [     1223.2,       343.4,      1243.2,      394.17],
       [     639.72,      396.12,      661.25,      451.59],
       [       1467,       314.6,      1493.8,      360.99],
       [     1292.2,      387.02,      1314.9,      451.47],
       [     1550.3,       363.9,        1571,      419.81],
       [     811.91,      325.65,      828.43,      366.12],
       [     808.55,      369.75,      828.88,      423.46],
       [     1010.7,      366.89,        1033,      424.42],
       [     1190.5,      319.59,      1208.9,      372.18],
       [     345.53,      300.27,      363.52,      346.77],
       [     1009.3,      366.15,      1031.9,      426.33],
       [     765.76,       305.2,      783.15,      351.31]])

true_detections=np.array([[        347,         302,         363,         344,           3],
       [        813,         321,         830,         366,           2],
       [        765,         303,         783,         351,           2],
       [       1191,         322,        1211,         372,           2],
       [        810,         371,         830,         424,           2],
       [       1599,         318,        1619,         374,           2],
       [       1225,         344,        1244,         395,           2],
       [       1467,         314,        1495,         361,           2],
       [        416,         331,         436,         384,           2],
       [        641,         396,         662,         452,           2],
       [        233,         369,         267,         432,           2],
       [       1552,         365,        1572,         420,           2],
       [       1418,         359,        1437,         418,           2],
       [       1251,         401,        1274,         460,           2],
       [       1011,         366,        1034,         426,           3],
       [       1291,         388,        1316,         451,           2],
       [        523,         468,         555,         544,           2],
       [       1708,         486,        1735,         557,           2],
       [       1007,         641,        1044,         727,           2],
       [       1641,         556,        1665,         633,           2],
       [       1161,         484,        1187,         552,           2],
       [        161,         495,         207,         572,           2],
       [     236.96,         414,      247.59,      425.24,           0]])

from ultralytics.yolo.utils.metrics import ConfusionMatrix, DetMetrics, box_iou
import torch
iou=box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))
print(iou[iou>0.45])

# ours (from within _evaluate_detection_batch)
iou_batch = box_iou_batch(
    boxes_true=true_boxes, boxes_detection=detection_boxes
)
print(iou_batch[iou_batch>0.45])

SkalskiP · 2023-07-20T16:08:02Z

@kirilllzaitsev I just run your test, and it looks like I'm getting exactly the same results for both from ultralytics.yolo.utils.metrics import box_iou and from supervision.detection.utils import box_iou_batch. Here is my code:

import torch

import numpy as np

from ultralytics.yolo.utils.metrics import box_iou
from supervision.detection.utils import box_iou_batch

detection_boxes = np.array([
    [     1638.8,      557.36,      1664.2,       634.8],
    [     1005.9,      641.22,      1042.6,       727.3],
    [     162.18,      495.79,      204.39,      571.12],
    [     1158.9,      484.44,        1184,       551.7],
    [     1706.4,       483.9,      1734.4,      557.86],
    [     1416.2,      357.97,        1436,      417.01],
    [     235.93,         369,      265.36,      431.33],
    [     1600.1,      317.87,      1617.4,      373.95],
    [     1249.6,      399.89,      1274.4,      459.96],
    [     414.32,      332.96,       436.2,      383.91],
    [      523.4,      469.03,      555.29,      542.89],
    [     1223.2,       343.4,      1243.2,      394.17],
    [     639.72,      396.12,      661.25,      451.59],
    [       1467,       314.6,      1493.8,      360.99],
    [     1292.2,      387.02,      1314.9,      451.47],
    [     1550.3,       363.9,        1571,      419.81],
    [     811.91,      325.65,      828.43,      366.12],
    [     808.55,      369.75,      828.88,      423.46],
    [     1010.7,      366.89,        1033,      424.42],
    [     1190.5,      319.59,      1208.9,      372.18],
    [     345.53,      300.27,      363.52,      346.77],
    [     1009.3,      366.15,      1031.9,      426.33],
    [     765.76,       305.2,      783.15,      351.31]
])

true_boxes=np.array([
    [        347,         302,         363,         344,           3],
    [        813,         321,         830,         366,           2],
    [        765,         303,         783,         351,           2],
    [       1191,         322,        1211,         372,           2],
    [        810,         371,         830,         424,           2],
    [       1599,         318,        1619,         374,           2],
    [       1225,         344,        1244,         395,           2],
    [       1467,         314,        1495,         361,           2],
    [        416,         331,         436,         384,           2],
    [        641,         396,         662,         452,           2],
    [        233,         369,         267,         432,           2],
    [       1552,         365,        1572,         420,           2],
    [       1418,         359,        1437,         418,           2],
    [       1251,         401,        1274,         460,           2],
    [       1011,         366,        1034,         426,           3],
    [       1291,         388,        1316,         451,           2],
    [        523,         468,         555,         544,           2],
    [       1708,         486,        1735,         557,           2],
    [       1007,         641,        1044,         727,           2],
    [       1641,         556,        1665,         633,           2],
    [       1161,         484,        1187,         552,           2],
    [        161,         495,         207,         572,           2],
    [     236.96,         414,      247.59,      425.24,           0]
])
true_boxes = true_boxes[:,:4]

yolo = box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes)).numpy()
ours = box_iou_batch(boxes_true=true_boxes, boxes_detection=detection_boxes)

(yolo - ours).sum()
# -1.4583973750870172e-09

yolo[yolo>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

ours[ours>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

Notice the order of arguments. You did:

box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))

I did:

box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes))

Please double-check if my math checks out. But it looks to me the IoU part is okey.

kirilllzaitsev · 2023-07-20T19:33:58Z

@kirilllzaitsev I just run your test, and it looks like I'm getting exactly the same results for both from ultralytics.yolo.utils.metrics import box_iou and from supervision.detection.utils import box_iou_batch. Here is my code:

import torch

import numpy as np

from ultralytics.yolo.utils.metrics import box_iou
from supervision.detection.utils import box_iou_batch

detection_boxes = np.array([
    [     1638.8,      557.36,      1664.2,       634.8],
    [     1005.9,      641.22,      1042.6,       727.3],
    [     162.18,      495.79,      204.39,      571.12],
    [     1158.9,      484.44,        1184,       551.7],
    [     1706.4,       483.9,      1734.4,      557.86],
    [     1416.2,      357.97,        1436,      417.01],
    [     235.93,         369,      265.36,      431.33],
    [     1600.1,      317.87,      1617.4,      373.95],
    [     1249.6,      399.89,      1274.4,      459.96],
    [     414.32,      332.96,       436.2,      383.91],
    [      523.4,      469.03,      555.29,      542.89],
    [     1223.2,       343.4,      1243.2,      394.17],
    [     639.72,      396.12,      661.25,      451.59],
    [       1467,       314.6,      1493.8,      360.99],
    [     1292.2,      387.02,      1314.9,      451.47],
    [     1550.3,       363.9,        1571,      419.81],
    [     811.91,      325.65,      828.43,      366.12],
    [     808.55,      369.75,      828.88,      423.46],
    [     1010.7,      366.89,        1033,      424.42],
    [     1190.5,      319.59,      1208.9,      372.18],
    [     345.53,      300.27,      363.52,      346.77],
    [     1009.3,      366.15,      1031.9,      426.33],
    [     765.76,       305.2,      783.15,      351.31]
])

true_boxes=np.array([
    [        347,         302,         363,         344,           3],
    [        813,         321,         830,         366,           2],
    [        765,         303,         783,         351,           2],
    [       1191,         322,        1211,         372,           2],
    [        810,         371,         830,         424,           2],
    [       1599,         318,        1619,         374,           2],
    [       1225,         344,        1244,         395,           2],
    [       1467,         314,        1495,         361,           2],
    [        416,         331,         436,         384,           2],
    [        641,         396,         662,         452,           2],
    [        233,         369,         267,         432,           2],
    [       1552,         365,        1572,         420,           2],
    [       1418,         359,        1437,         418,           2],
    [       1251,         401,        1274,         460,           2],
    [       1011,         366,        1034,         426,           3],
    [       1291,         388,        1316,         451,           2],
    [        523,         468,         555,         544,           2],
    [       1708,         486,        1735,         557,           2],
    [       1007,         641,        1044,         727,           2],
    [       1641,         556,        1665,         633,           2],
    [       1161,         484,        1187,         552,           2],
    [        161,         495,         207,         572,           2],
    [     236.96,         414,      247.59,      425.24,           0]
])
true_boxes = true_boxes[:,:4]

yolo = box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes)).numpy()
ours = box_iou_batch(boxes_true=true_boxes, boxes_detection=detection_boxes)

(yolo - ours).sum()
# -1.4583973750870172e-09

yolo[yolo>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

ours[ours>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

Notice the order of arguments. You did:

box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))

I did:

box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes))

Please double-check if my math checks out. But it looks to me the IoU part is okey.

Indeed, thanks! Since ConfusionMatrix._evaluate_detection_batch and ultralytics.yolo.utils.metrics.ConfusionMatrix.process_batch are almost identical in the rest of the logic, a hypothesis is that ConfusionMatrix.benchmark is not the reason for the slight mismatch we observe.

Also checked if there are some empty predictions for the validation dataset we are using, but there are none.

SkalskiP · 2023-07-20T20:37:45Z

test/metrics/test_detection.py

@@ -0,0 +1,370 @@
+from contextlib import ExitStack as DoesNotRaise
+from test.utils import dummy_detection_dataset_with_map_img_to_annotation


you accidentally left unused import

SkalskiP · 2023-07-20T20:39:58Z

supervision/metrics/detection.py

+
+        Args:
+            predictions: detected objects. Each element of the list describes a single image and has shape = (M, 6) where M is the number of detected objects. Each row is expected to be in (x_min, y_min, x_max, y_max, class, conf) format.
+            target: ground-truth objects. Each element of the list describes a single image and has shape = (N, 5) where N is the number of ground-truth objects. Each row is expected to be in (x_min, y_min, x_max, y_max, class) format.


Notice that you have a different name for the argument here. It should be targets.

SkalskiP · 2023-07-20T20:46:45Z

supervision/metrics/detection.py

+
+    @staticmethod
+    def _drop_extra_matches(matches: np.ndarray) -> np.ndarray:
+        """


Let's drop the docstring here. We have a script that processes our code and builds documentation. It is looking for docstings. Given the fact that this method is private, we wouldn't want it to be exposed.

_evaluate_detection_batch and _drop_extra_matches have docstrings that shouldn't be exposed, yet the methods don't seem straightforward to parse quickly with a human eye. Is there a solution on the documentation builder's side to skip methods that start with an underscore?

I agree with making _evaluate_detection_batch public, but the question from above still holds

You can get around it with the regular Python comment # at the top of the method body.

SkalskiP · 2023-07-20T20:50:49Z

supervision/metrics/detection.py

+                )
+
+    @staticmethod
+    def _evaluate_detection_batch(


I think we should make this static function public. I think it can be very useful on its own.

SkalskiP · 2023-07-20T20:51:52Z

supervision/metrics/detection.py

+
+    @staticmethod
+    def _evaluate_detection_batch(
+        true_detections: np.ndarray,


Also let's use consistent naming conventions and name arguments: predictions and targets.

SkalskiP · 2023-07-20T20:55:11Z

supervision/metrics/detection.py

+
+
+        Example:
+        ```


Please make it ```python

SkalskiP · 2023-07-20T20:56:53Z

supervision/metrics/detection.py

+
+        Example:
+        ```
+        >>> from supervision.metrics.detection import ConfusionMatrix


It is an example of external usage, so let's promote import supervision as sv

SkalskiP · 2023-07-20T20:59:37Z

test/metrics/test_detection.py

+
+
+@pytest.mark.parametrize(
+    "predictions, targets, classes, conf_threshold, iou_threshold, expected_result, exception",


I love it 🔥

SkalskiP · 2023-07-20T21:04:33Z

test/utils.py

+    return dataset
+
+
+def dummy_detection_dataset_with_map_img_to_annotation():


That one is no longer in use. Let's drop it.

SkalskiP · 2023-07-20T21:05:41Z

test/utils.py

+    return DetectionDataset(classes=classes, images=images, annotations=annotations)
+
+
+def dummy_detection_dataset():


That one is no longer in use. Let's drop it.

SkalskiP · 2023-07-20T21:05:47Z

test/utils.py

    )
+
+
+def mock_detection_dataset(


That one is no longer in use. Let's drop it.

SkalskiP · 2023-07-20T22:43:20Z

@kirilllzaitsev, I added a few more comments to your PR. Please make those changes, and we will merge tomorrow. As for the inconsistency between YOLOv8 and our ConfusionMetrix, we should probably handle that in a separate PR.

I can do that investigation as it looks to be time-consuming.

SkalskiP · 2023-07-21T10:20:06Z

@kirilllzaitsev, let me know when you'll be ready for final review.

kirilllzaitsev · 2023-07-21T10:24:44Z

@SkalskiP, hi, should be fine now

SkalskiP · 2023-07-21T10:33:31Z

@kirilllzaitsev I'm looking now 👀

SkalskiP · 2023-07-21T11:31:21Z

@kirilllzaitsev merging! There are still a few tiny things around docs. But I'll do it myself. I don't want to bother you with that.

Thanks a lot for the contribution! It was indeed a pleasure! I'm sorry I was a bit less responsive last week. 🙏🏻

I'm curious if you'd like to contribute in the future.

kirilllzaitsev · 2023-07-21T12:26:42Z

@SkalskiP, thanks for your active involvement and all the feedback. I'd love to continue contributing.

SkalskiP · 2023-07-21T12:31:02Z

@kirilllzaitsev, awesome! I'd love to add you to our Slack channel for Supervision Contributors. I need your email to do that.

Wanna stay in the metrics ecosystem, or do you want to work on something else next release?

kirilllzaitsev · 2023-07-21T14:55:03Z

Sure, here it is, [email protected].
No objections to working on something else. Metrics is also fine.

SkalskiP · 2023-07-21T15:42:10Z

@kirilllzaitsev, you should get the invite soon. Please look at the Backlog column here and let me know if anything looks interesting.

add detection module

6a8f3fc

github-actions bot reviewed Jul 4, 2023

View reviewed changes

kirilllzaitsev added 3 commits July 4, 2023 22:03

add docstring to from_detections

e1ee80c

add _drop_extra_matches from onemetric

43cca52

minor doc update

c9d2b52

SkalskiP added the version: 0.12.0 Feature to be added in `0.12.0` release label Jul 4, 2023

onuralpszr mentioned this pull request Jul 4, 2023

Introduce pyproject toml file with optional install method and dropping python 3.7 support #180

Merged

3 tasks

kirilllzaitsev added 2 commits July 5, 2023 14:08

non-int ticks

35d9ee7

add benchmark()

4375e19

kirilllzaitsev added 6 commits July 6, 2023 22:27

refine plot()

6784c78

rm test setup from plot()

e01e495

set figsize as plot() arg

ad06937

rm extra colorbar configs

85c4d5a

docs

9c5c296

add empty test templates

d4ea295

kirilllzaitsev marked this pull request as ready for review July 7, 2023 17:01

kirilllzaitsev added 11 commits July 7, 2023 22:28

add test_from_detections

476be52

add from_matrix()

2b876bd

add test_evaluate_detection_batch

d480ff5

add test_drop_extra_matches

5107dd3

add test_from_matrix

8f39df2

add dummy_detection_dataset to utils

1414283

add test_benchmark

b7f72b7

rename vars, refactor ternary

36b48de

remove num_classes attr of ConfusionMatrix

1b013f8

refactor, remove duplicated

21a480b

add source

fdd8127

kirilllzaitsev added 2 commits July 20, 2023 15:48

add convert_detections_to_tensor

8d131ad

remove test_from_detections

6016338

cleanup test_detection

7a782b1

SkalskiP requested changes Jul 20, 2023

View reviewed changes

SkalskiP reviewed Jul 20, 2023

View reviewed changes

supervision/metrics/detection.py Outdated

Example:

```

Copy link

Collaborator

SkalskiP Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it ```python

kirilllzaitsev reacted with thumbs up emoji

SkalskiP reviewed Jul 20, 2023

View reviewed changes

kirilllzaitsev added 3 commits July 21, 2023 11:42

updates patch

3f55e7a

drop docstring in _drop_extra_matches

54524a4

remove mock detection dataset

04b7bad

add comments to _drop_extra_matches

4b38634

SkalskiP merged commit ea4638f into roboflow:main Jul 21, 2023

SkalskiP mentioned this pull request Jul 21, 2023

Investigate value differences between our ConfusionMatrix and YOLOv8 implementation #209

Closed

1 task

LinasKo mentioned this pull request Jul 16, 2024

Metrics API #1366

Closed

		@@ -0,0 +1,370 @@
		from contextlib import ExitStack as DoesNotRaise
		from test.utils import dummy_detection_dataset_with_map_img_to_annotation



		@pytest.mark.parametrize(
		"predictions, targets, classes, conf_threshold, iou_threshold, expected_result, exception",

		return dataset


		def dummy_detection_dataset_with_map_img_to_annotation():

		return DetectionDataset(classes=classes, images=images, annotations=annotations)


		def dummy_detection_dataset():

Add ConfusionMatrix to EvaluationAPI #177

Add ConfusionMatrix to EvaluationAPI #177

Uh oh!

Conversation

kirilllzaitsev commented Jul 4, 2023

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

SkalskiP commented Jul 4, 2023

Uh oh!

kirilllzaitsev commented Jul 5, 2023

Uh oh!

kirilllzaitsev commented Jul 6, 2023

Uh oh!

SkalskiP commented Jul 20, 2023

Uh oh!

kirilllzaitsev commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SkalskiP commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kirilllzaitsev commented Jul 20, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SkalskiP commented Jul 20, 2023

Uh oh!

SkalskiP commented Jul 21, 2023

Uh oh!

kirilllzaitsev commented Jul 21, 2023

Uh oh!

SkalskiP commented Jul 21, 2023

Uh oh!

SkalskiP commented Jul 21, 2023

Uh oh!

kirilllzaitsev commented Jul 21, 2023

Uh oh!

SkalskiP commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kirilllzaitsev commented Jul 21, 2023

Uh oh!

SkalskiP commented Jul 21, 2023

Uh oh!

Uh oh!

kirilllzaitsev commented Jul 20, 2023 •

edited

Loading

SkalskiP commented Jul 20, 2023 •

edited

Loading

SkalskiP commented Jul 21, 2023 •

edited

Loading