Skip to content

Conversation

kirilllzaitsev
Copy link
Contributor

Description

Please include a summary of the change and which issue is fixed or implemented. Please also include relevant motivation and context (e.g. links, docs, tickets etc.).

List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

See test/detection/test_*.py

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • Docs updated? What were the changes:

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there, thank you for opening an PR ! 🙏🏻 The team was notified and they will get back to you asap.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jul 4, 2023

Hi, @kirilllzaitsev 👋🏻! Are you ready for an initial review and tests?

@kirilllzaitsev
Copy link
Contributor Author

Hi, @kirilllzaitsev 👋🏻! Are you ready for an initial review and tests?

@SkalskiP , hi. I still need to add the benchmark() method and do some cosmetics to the plot(). There are no tests in test_detection.py yet. Working bit by bit.

@kirilllzaitsev
Copy link
Contributor Author

@SkalskiP, here are some images coming from the plot() function. Please let me know if there is something important to refine.
20-classes-norm
20-classes-unnorm
80-classes
no-classes

@kirilllzaitsev kirilllzaitsev marked this pull request as ready for review July 7, 2023 17:01
@SkalskiP
Copy link
Collaborator

@kirilllzaitsev Could you share the input that you used? I just pasted both implementations into Code Interpreter and got almost the same results for both functions.https://chat.openai.com/share/acd17c82-87f6-499a-9d09-0b675b4ef4ea

If you could share the input numpy array used to create those results, that would be super helpful. We could start to debug.

@kirilllzaitsev
Copy link
Contributor Author

kirilllzaitsev commented Jul 20, 2023

@kirilllzaitsev Could you share the input that you used? I just pasted both implementations into Code Interpreter and got almost the same results for both functions.https://chat.openai.com/share/acd17c82-87f6-499a-9d09-0b675b4ef4ea

If you could share the input numpy array used to create those results, that would be super helpful. We could start to debug.

detections_boxes = np.array([[     1638.8,      557.36,      1664.2,       634.8],
       [     1005.9,      641.22,      1042.6,       727.3],
       [     162.18,      495.79,      204.39,      571.12],
       [     1158.9,      484.44,        1184,       551.7],
       [     1706.4,       483.9,      1734.4,      557.86],
       [     1416.2,      357.97,        1436,      417.01],
       [     235.93,         369,      265.36,      431.33],
       [     1600.1,      317.87,      1617.4,      373.95],
       [     1249.6,      399.89,      1274.4,      459.96],
       [     414.32,      332.96,       436.2,      383.91],
       [      523.4,      469.03,      555.29,      542.89],
       [     1223.2,       343.4,      1243.2,      394.17],
       [     639.72,      396.12,      661.25,      451.59],
       [       1467,       314.6,      1493.8,      360.99],
       [     1292.2,      387.02,      1314.9,      451.47],
       [     1550.3,       363.9,        1571,      419.81],
       [     811.91,      325.65,      828.43,      366.12],
       [     808.55,      369.75,      828.88,      423.46],
       [     1010.7,      366.89,        1033,      424.42],
       [     1190.5,      319.59,      1208.9,      372.18],
       [     345.53,      300.27,      363.52,      346.77],
       [     1009.3,      366.15,      1031.9,      426.33],
       [     765.76,       305.2,      783.15,      351.31]])

true_detections=np.array([[        347,         302,         363,         344,           3],
       [        813,         321,         830,         366,           2],
       [        765,         303,         783,         351,           2],
       [       1191,         322,        1211,         372,           2],
       [        810,         371,         830,         424,           2],
       [       1599,         318,        1619,         374,           2],
       [       1225,         344,        1244,         395,           2],
       [       1467,         314,        1495,         361,           2],
       [        416,         331,         436,         384,           2],
       [        641,         396,         662,         452,           2],
       [        233,         369,         267,         432,           2],
       [       1552,         365,        1572,         420,           2],
       [       1418,         359,        1437,         418,           2],
       [       1251,         401,        1274,         460,           2],
       [       1011,         366,        1034,         426,           3],
       [       1291,         388,        1316,         451,           2],
       [        523,         468,         555,         544,           2],
       [       1708,         486,        1735,         557,           2],
       [       1007,         641,        1044,         727,           2],
       [       1641,         556,        1665,         633,           2],
       [       1161,         484,        1187,         552,           2],
       [        161,         495,         207,         572,           2],
       [     236.96,         414,      247.59,      425.24,           0]])

from ultralytics.yolo.utils.metrics import ConfusionMatrix, DetMetrics, box_iou
import torch
iou=box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))
print(iou[iou>0.45])

# ours (from within _evaluate_detection_batch)
iou_batch = box_iou_batch(
    boxes_true=true_boxes, boxes_detection=detection_boxes
)
print(iou_batch[iou_batch>0.45])

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jul 20, 2023

@kirilllzaitsev I just run your test, and it looks like I'm getting exactly the same results for both from ultralytics.yolo.utils.metrics import box_iou and from supervision.detection.utils import box_iou_batch. Here is my code:

import torch

import numpy as np

from ultralytics.yolo.utils.metrics import box_iou
from supervision.detection.utils import box_iou_batch

detection_boxes = np.array([
    [     1638.8,      557.36,      1664.2,       634.8],
    [     1005.9,      641.22,      1042.6,       727.3],
    [     162.18,      495.79,      204.39,      571.12],
    [     1158.9,      484.44,        1184,       551.7],
    [     1706.4,       483.9,      1734.4,      557.86],
    [     1416.2,      357.97,        1436,      417.01],
    [     235.93,         369,      265.36,      431.33],
    [     1600.1,      317.87,      1617.4,      373.95],
    [     1249.6,      399.89,      1274.4,      459.96],
    [     414.32,      332.96,       436.2,      383.91],
    [      523.4,      469.03,      555.29,      542.89],
    [     1223.2,       343.4,      1243.2,      394.17],
    [     639.72,      396.12,      661.25,      451.59],
    [       1467,       314.6,      1493.8,      360.99],
    [     1292.2,      387.02,      1314.9,      451.47],
    [     1550.3,       363.9,        1571,      419.81],
    [     811.91,      325.65,      828.43,      366.12],
    [     808.55,      369.75,      828.88,      423.46],
    [     1010.7,      366.89,        1033,      424.42],
    [     1190.5,      319.59,      1208.9,      372.18],
    [     345.53,      300.27,      363.52,      346.77],
    [     1009.3,      366.15,      1031.9,      426.33],
    [     765.76,       305.2,      783.15,      351.31]
])

true_boxes=np.array([
    [        347,         302,         363,         344,           3],
    [        813,         321,         830,         366,           2],
    [        765,         303,         783,         351,           2],
    [       1191,         322,        1211,         372,           2],
    [        810,         371,         830,         424,           2],
    [       1599,         318,        1619,         374,           2],
    [       1225,         344,        1244,         395,           2],
    [       1467,         314,        1495,         361,           2],
    [        416,         331,         436,         384,           2],
    [        641,         396,         662,         452,           2],
    [        233,         369,         267,         432,           2],
    [       1552,         365,        1572,         420,           2],
    [       1418,         359,        1437,         418,           2],
    [       1251,         401,        1274,         460,           2],
    [       1011,         366,        1034,         426,           3],
    [       1291,         388,        1316,         451,           2],
    [        523,         468,         555,         544,           2],
    [       1708,         486,        1735,         557,           2],
    [       1007,         641,        1044,         727,           2],
    [       1641,         556,        1665,         633,           2],
    [       1161,         484,        1187,         552,           2],
    [        161,         495,         207,         572,           2],
    [     236.96,         414,      247.59,      425.24,           0]
])
true_boxes = true_boxes[:,:4]

yolo = box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes)).numpy()
ours = box_iou_batch(boxes_true=true_boxes, boxes_detection=detection_boxes)

(yolo - ours).sum()
# -1.4583973750870172e-09

yolo[yolo>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

ours[ours>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

Notice the order of arguments. You did:

box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))

I did:

box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes))

Please double-check if my math checks out. But it looks to me the IoU part is okey.

@kirilllzaitsev
Copy link
Contributor Author

@kirilllzaitsev I just run your test, and it looks like I'm getting exactly the same results for both from ultralytics.yolo.utils.metrics import box_iou and from supervision.detection.utils import box_iou_batch. Here is my code:

import torch

import numpy as np

from ultralytics.yolo.utils.metrics import box_iou
from supervision.detection.utils import box_iou_batch

detection_boxes = np.array([
    [     1638.8,      557.36,      1664.2,       634.8],
    [     1005.9,      641.22,      1042.6,       727.3],
    [     162.18,      495.79,      204.39,      571.12],
    [     1158.9,      484.44,        1184,       551.7],
    [     1706.4,       483.9,      1734.4,      557.86],
    [     1416.2,      357.97,        1436,      417.01],
    [     235.93,         369,      265.36,      431.33],
    [     1600.1,      317.87,      1617.4,      373.95],
    [     1249.6,      399.89,      1274.4,      459.96],
    [     414.32,      332.96,       436.2,      383.91],
    [      523.4,      469.03,      555.29,      542.89],
    [     1223.2,       343.4,      1243.2,      394.17],
    [     639.72,      396.12,      661.25,      451.59],
    [       1467,       314.6,      1493.8,      360.99],
    [     1292.2,      387.02,      1314.9,      451.47],
    [     1550.3,       363.9,        1571,      419.81],
    [     811.91,      325.65,      828.43,      366.12],
    [     808.55,      369.75,      828.88,      423.46],
    [     1010.7,      366.89,        1033,      424.42],
    [     1190.5,      319.59,      1208.9,      372.18],
    [     345.53,      300.27,      363.52,      346.77],
    [     1009.3,      366.15,      1031.9,      426.33],
    [     765.76,       305.2,      783.15,      351.31]
])

true_boxes=np.array([
    [        347,         302,         363,         344,           3],
    [        813,         321,         830,         366,           2],
    [        765,         303,         783,         351,           2],
    [       1191,         322,        1211,         372,           2],
    [        810,         371,         830,         424,           2],
    [       1599,         318,        1619,         374,           2],
    [       1225,         344,        1244,         395,           2],
    [       1467,         314,        1495,         361,           2],
    [        416,         331,         436,         384,           2],
    [        641,         396,         662,         452,           2],
    [        233,         369,         267,         432,           2],
    [       1552,         365,        1572,         420,           2],
    [       1418,         359,        1437,         418,           2],
    [       1251,         401,        1274,         460,           2],
    [       1011,         366,        1034,         426,           3],
    [       1291,         388,        1316,         451,           2],
    [        523,         468,         555,         544,           2],
    [       1708,         486,        1735,         557,           2],
    [       1007,         641,        1044,         727,           2],
    [       1641,         556,        1665,         633,           2],
    [       1161,         484,        1187,         552,           2],
    [        161,         495,         207,         572,           2],
    [     236.96,         414,      247.59,      425.24,           0]
])
true_boxes = true_boxes[:,:4]

yolo = box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes)).numpy()
ours = box_iou_batch(boxes_true=true_boxes, boxes_detection=detection_boxes)

(yolo - ours).sum()
# -1.4583973750870172e-09

yolo[yolo>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

ours[ours>0.45]
# array([    0.80331,     0.76773,     0.90109,     0.83438,     0.85274,      0.8625,     0.85232,     0.94472,     0.88165,     0.90077,     0.85638,     0.85644,     0.83815,     0.90972,     0.90582,     0.83996,     0.88941,     0.95132,     0.88688,     0.92894,     0.85182,     0.81026,     0.89771])

Notice the order of arguments. You did:

box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))

I did:

box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes))

Please double-check if my math checks out. But it looks to me the IoU part is okey.

Indeed, thanks! Since ConfusionMatrix._evaluate_detection_batch and ultralytics.yolo.utils.metrics.ConfusionMatrix.process_batch are almost identical in the rest of the logic, a hypothesis is that ConfusionMatrix.benchmark is not the reason for the slight mismatch we observe.

Also checked if there are some empty predictions for the validation dataset we are using, but there are none.

@@ -0,0 +1,370 @@
from contextlib import ExitStack as DoesNotRaise
from test.utils import dummy_detection_dataset_with_map_img_to_annotation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you accidentally left unused import


Args:
predictions: detected objects. Each element of the list describes a single image and has shape = (M, 6) where M is the number of detected objects. Each row is expected to be in (x_min, y_min, x_max, y_max, class, conf) format.
target: ground-truth objects. Each element of the list describes a single image and has shape = (N, 5) where N is the number of ground-truth objects. Each row is expected to be in (x_min, y_min, x_max, y_max, class) format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that you have a different name for the argument here. It should be targets.


@staticmethod
def _drop_extra_matches(matches: np.ndarray) -> np.ndarray:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop the docstring here. We have a script that processes our code and builds documentation. It is looking for docstings. Given the fact that this method is private, we wouldn't want it to be exposed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_evaluate_detection_batch and _drop_extra_matches have docstrings that shouldn't be exposed, yet the methods don't seem straightforward to parse quickly with a human eye. Is there a solution on the documentation builder's side to skip methods that start with an underscore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with making _evaluate_detection_batch public, but the question from above still holds

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get around it with the regular Python comment # at the top of the method body.

)

@staticmethod
def _evaluate_detection_batch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this static function public. I think it can be very useful on its own.


@staticmethod
def _evaluate_detection_batch(
true_detections: np.ndarray,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's use consistent naming conventions and name arguments: predictions and targets.



Example:
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it ```python


Example:
```
>>> from supervision.metrics.detection import ConfusionMatrix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an example of external usage, so let's promote import supervision as sv



@pytest.mark.parametrize(
"predictions, targets, classes, conf_threshold, iou_threshold, expected_result, exception",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it 🔥

test/utils.py Outdated
return dataset


def dummy_detection_dataset_with_map_img_to_annotation():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is no longer in use. Let's drop it.

test/utils.py Outdated
return DetectionDataset(classes=classes, images=images, annotations=annotations)


def dummy_detection_dataset():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is no longer in use. Let's drop it.

test/utils.py Outdated
)


def mock_detection_dataset(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is no longer in use. Let's drop it.

@SkalskiP
Copy link
Collaborator

@kirilllzaitsev, I added a few more comments to your PR. Please make those changes, and we will merge tomorrow. As for the inconsistency between YOLOv8 and our ConfusionMetrix, we should probably handle that in a separate PR.

I can do that investigation as it looks to be time-consuming.

@SkalskiP
Copy link
Collaborator

@kirilllzaitsev, let me know when you'll be ready for final review.

@kirilllzaitsev
Copy link
Contributor Author

@SkalskiP, hi, should be fine now

@SkalskiP
Copy link
Collaborator

@kirilllzaitsev I'm looking now 👀

@SkalskiP
Copy link
Collaborator

@kirilllzaitsev merging! There are still a few tiny things around docs. But I'll do it myself. I don't want to bother you with that.

Thanks a lot for the contribution! It was indeed a pleasure! I'm sorry I was a bit less responsive last week. 🙏🏻

I'm curious if you'd like to contribute in the future.

@SkalskiP SkalskiP merged commit ea4638f into roboflow:main Jul 21, 2023
@kirilllzaitsev
Copy link
Contributor Author

@SkalskiP, thanks for your active involvement and all the feedback. I'd love to continue contributing.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jul 21, 2023

@kirilllzaitsev, awesome! I'd love to add you to our Slack channel for Supervision Contributors. I need your email to do that.

Wanna stay in the metrics ecosystem, or do you want to work on something else next release?

@kirilllzaitsev
Copy link
Contributor Author

Sure, here it is, [email protected].
No objections to working on something else. Metrics is also fine.

@SkalskiP
Copy link
Collaborator

@kirilllzaitsev, you should get the invite soon. Please look at the Backlog column here and let me know if anything looks interesting.

@LinasKo LinasKo mentioned this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version: 0.12.0 Feature to be added in `0.12.0` release
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Object Detection Confusion Matrix - Evaluation API
2 participants