-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add ConfusionMatrix to EvaluationAPI #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there, thank you for opening an PR ! 🙏🏻 The team was notified and they will get back to you asap.
Hi, @kirilllzaitsev 👋🏻! Are you ready for an initial review and tests? |
@SkalskiP , hi. I still need to add the benchmark() method and do some cosmetics to the plot(). There are no tests in test_detection.py yet. Working bit by bit. |
@SkalskiP, here are some images coming from the plot() function. Please let me know if there is something important to refine. |
@kirilllzaitsev Could you share the input that you used? I just pasted both implementations into Code Interpreter and got almost the same results for both functions.https://chat.openai.com/share/acd17c82-87f6-499a-9d09-0b675b4ef4ea If you could share the input numpy array used to create those results, that would be super helpful. We could start to debug. |
detections_boxes = np.array([[ 1638.8, 557.36, 1664.2, 634.8],
[ 1005.9, 641.22, 1042.6, 727.3],
[ 162.18, 495.79, 204.39, 571.12],
[ 1158.9, 484.44, 1184, 551.7],
[ 1706.4, 483.9, 1734.4, 557.86],
[ 1416.2, 357.97, 1436, 417.01],
[ 235.93, 369, 265.36, 431.33],
[ 1600.1, 317.87, 1617.4, 373.95],
[ 1249.6, 399.89, 1274.4, 459.96],
[ 414.32, 332.96, 436.2, 383.91],
[ 523.4, 469.03, 555.29, 542.89],
[ 1223.2, 343.4, 1243.2, 394.17],
[ 639.72, 396.12, 661.25, 451.59],
[ 1467, 314.6, 1493.8, 360.99],
[ 1292.2, 387.02, 1314.9, 451.47],
[ 1550.3, 363.9, 1571, 419.81],
[ 811.91, 325.65, 828.43, 366.12],
[ 808.55, 369.75, 828.88, 423.46],
[ 1010.7, 366.89, 1033, 424.42],
[ 1190.5, 319.59, 1208.9, 372.18],
[ 345.53, 300.27, 363.52, 346.77],
[ 1009.3, 366.15, 1031.9, 426.33],
[ 765.76, 305.2, 783.15, 351.31]])
true_detections=np.array([[ 347, 302, 363, 344, 3],
[ 813, 321, 830, 366, 2],
[ 765, 303, 783, 351, 2],
[ 1191, 322, 1211, 372, 2],
[ 810, 371, 830, 424, 2],
[ 1599, 318, 1619, 374, 2],
[ 1225, 344, 1244, 395, 2],
[ 1467, 314, 1495, 361, 2],
[ 416, 331, 436, 384, 2],
[ 641, 396, 662, 452, 2],
[ 233, 369, 267, 432, 2],
[ 1552, 365, 1572, 420, 2],
[ 1418, 359, 1437, 418, 2],
[ 1251, 401, 1274, 460, 2],
[ 1011, 366, 1034, 426, 3],
[ 1291, 388, 1316, 451, 2],
[ 523, 468, 555, 544, 2],
[ 1708, 486, 1735, 557, 2],
[ 1007, 641, 1044, 727, 2],
[ 1641, 556, 1665, 633, 2],
[ 1161, 484, 1187, 552, 2],
[ 161, 495, 207, 572, 2],
[ 236.96, 414, 247.59, 425.24, 0]])
from ultralytics.yolo.utils.metrics import ConfusionMatrix, DetMetrics, box_iou
import torch
iou=box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes))
print(iou[iou>0.45])
# ours (from within _evaluate_detection_batch)
iou_batch = box_iou_batch(
boxes_true=true_boxes, boxes_detection=detection_boxes
)
print(iou_batch[iou_batch>0.45]) |
@kirilllzaitsev I just run your test, and it looks like I'm getting exactly the same results for both import torch
import numpy as np
from ultralytics.yolo.utils.metrics import box_iou
from supervision.detection.utils import box_iou_batch
detection_boxes = np.array([
[ 1638.8, 557.36, 1664.2, 634.8],
[ 1005.9, 641.22, 1042.6, 727.3],
[ 162.18, 495.79, 204.39, 571.12],
[ 1158.9, 484.44, 1184, 551.7],
[ 1706.4, 483.9, 1734.4, 557.86],
[ 1416.2, 357.97, 1436, 417.01],
[ 235.93, 369, 265.36, 431.33],
[ 1600.1, 317.87, 1617.4, 373.95],
[ 1249.6, 399.89, 1274.4, 459.96],
[ 414.32, 332.96, 436.2, 383.91],
[ 523.4, 469.03, 555.29, 542.89],
[ 1223.2, 343.4, 1243.2, 394.17],
[ 639.72, 396.12, 661.25, 451.59],
[ 1467, 314.6, 1493.8, 360.99],
[ 1292.2, 387.02, 1314.9, 451.47],
[ 1550.3, 363.9, 1571, 419.81],
[ 811.91, 325.65, 828.43, 366.12],
[ 808.55, 369.75, 828.88, 423.46],
[ 1010.7, 366.89, 1033, 424.42],
[ 1190.5, 319.59, 1208.9, 372.18],
[ 345.53, 300.27, 363.52, 346.77],
[ 1009.3, 366.15, 1031.9, 426.33],
[ 765.76, 305.2, 783.15, 351.31]
])
true_boxes=np.array([
[ 347, 302, 363, 344, 3],
[ 813, 321, 830, 366, 2],
[ 765, 303, 783, 351, 2],
[ 1191, 322, 1211, 372, 2],
[ 810, 371, 830, 424, 2],
[ 1599, 318, 1619, 374, 2],
[ 1225, 344, 1244, 395, 2],
[ 1467, 314, 1495, 361, 2],
[ 416, 331, 436, 384, 2],
[ 641, 396, 662, 452, 2],
[ 233, 369, 267, 432, 2],
[ 1552, 365, 1572, 420, 2],
[ 1418, 359, 1437, 418, 2],
[ 1251, 401, 1274, 460, 2],
[ 1011, 366, 1034, 426, 3],
[ 1291, 388, 1316, 451, 2],
[ 523, 468, 555, 544, 2],
[ 1708, 486, 1735, 557, 2],
[ 1007, 641, 1044, 727, 2],
[ 1641, 556, 1665, 633, 2],
[ 1161, 484, 1187, 552, 2],
[ 161, 495, 207, 572, 2],
[ 236.96, 414, 247.59, 425.24, 0]
])
true_boxes = true_boxes[:,:4]
yolo = box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes)).numpy()
ours = box_iou_batch(boxes_true=true_boxes, boxes_detection=detection_boxes)
(yolo - ours).sum()
# -1.4583973750870172e-09
yolo[yolo>0.45]
# array([ 0.80331, 0.76773, 0.90109, 0.83438, 0.85274, 0.8625, 0.85232, 0.94472, 0.88165, 0.90077, 0.85638, 0.85644, 0.83815, 0.90972, 0.90582, 0.83996, 0.88941, 0.95132, 0.88688, 0.92894, 0.85182, 0.81026, 0.89771])
ours[ours>0.45]
# array([ 0.80331, 0.76773, 0.90109, 0.83438, 0.85274, 0.8625, 0.85232, 0.94472, 0.88165, 0.90077, 0.85638, 0.85644, 0.83815, 0.90972, 0.90582, 0.83996, 0.88941, 0.95132, 0.88688, 0.92894, 0.85182, 0.81026, 0.89771]) Notice the order of arguments. You did: box_iou(torch.from_numpy(detection_boxes), torch.from_numpy(true_boxes)) I did: box_iou(torch.from_numpy(true_boxes), torch.from_numpy(detection_boxes)) Please double-check if my math checks out. But it looks to me the IoU part is okey. |
Indeed, thanks! Since Also checked if there are some empty predictions for the validation dataset we are using, but there are none. |
test/metrics/test_detection.py
Outdated
@@ -0,0 +1,370 @@ | |||
from contextlib import ExitStack as DoesNotRaise | |||
from test.utils import dummy_detection_dataset_with_map_img_to_annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you accidentally left unused import
supervision/metrics/detection.py
Outdated
|
||
Args: | ||
predictions: detected objects. Each element of the list describes a single image and has shape = (M, 6) where M is the number of detected objects. Each row is expected to be in (x_min, y_min, x_max, y_max, class, conf) format. | ||
target: ground-truth objects. Each element of the list describes a single image and has shape = (N, 5) where N is the number of ground-truth objects. Each row is expected to be in (x_min, y_min, x_max, y_max, class) format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that you have a different name for the argument here. It should be targets
.
supervision/metrics/detection.py
Outdated
|
||
@staticmethod | ||
def _drop_extra_matches(matches: np.ndarray) -> np.ndarray: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop the docstring here. We have a script that processes our code and builds documentation. It is looking for docstings. Given the fact that this method is private, we wouldn't want it to be exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_evaluate_detection_batch and _drop_extra_matches have docstrings that shouldn't be exposed, yet the methods don't seem straightforward to parse quickly with a human eye. Is there a solution on the documentation builder's side to skip methods that start with an underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with making _evaluate_detection_batch public, but the question from above still holds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get around it with the regular Python comment #
at the top of the method body.
supervision/metrics/detection.py
Outdated
) | ||
|
||
@staticmethod | ||
def _evaluate_detection_batch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this static function public. I think it can be very useful on its own.
supervision/metrics/detection.py
Outdated
|
||
@staticmethod | ||
def _evaluate_detection_batch( | ||
true_detections: np.ndarray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's use consistent naming conventions and name arguments: predictions
and targets
.
supervision/metrics/detection.py
Outdated
|
||
|
||
Example: | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it ```python
supervision/metrics/detection.py
Outdated
|
||
Example: | ||
``` | ||
>>> from supervision.metrics.detection import ConfusionMatrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an example of external usage, so let's promote import supervision as sv
|
||
|
||
@pytest.mark.parametrize( | ||
"predictions, targets, classes, conf_threshold, iou_threshold, expected_result, exception", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it 🔥
test/utils.py
Outdated
return dataset | ||
|
||
|
||
def dummy_detection_dataset_with_map_img_to_annotation(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is no longer in use. Let's drop it.
test/utils.py
Outdated
return DetectionDataset(classes=classes, images=images, annotations=annotations) | ||
|
||
|
||
def dummy_detection_dataset(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is no longer in use. Let's drop it.
test/utils.py
Outdated
) | ||
|
||
|
||
def mock_detection_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is no longer in use. Let's drop it.
@kirilllzaitsev, I added a few more comments to your PR. Please make those changes, and we will merge tomorrow. As for the inconsistency between YOLOv8 and our ConfusionMetrix, we should probably handle that in a separate PR. I can do that investigation as it looks to be time-consuming. |
@kirilllzaitsev, let me know when you'll be ready for final review. |
@SkalskiP, hi, should be fine now |
@kirilllzaitsev I'm looking now 👀 |
@kirilllzaitsev merging! There are still a few tiny things around docs. But I'll do it myself. I don't want to bother you with that. Thanks a lot for the contribution! It was indeed a pleasure! I'm sorry I was a bit less responsive last week. 🙏🏻 I'm curious if you'd like to contribute in the future. |
@SkalskiP, thanks for your active involvement and all the feedback. I'd love to continue contributing. |
@kirilllzaitsev, awesome! I'd love to add you to our Slack channel for Supervision Contributors. I need your email to do that. Wanna stay in the metrics ecosystem, or do you want to work on something else next release? |
Sure, here it is, [email protected]. |
@kirilllzaitsev, you should get the invite soon. Please look at the |
Description
Please include a summary of the change and which issue is fixed or implemented. Please also include relevant motivation and context (e.g. links, docs, tickets etc.).
List any dependencies that are required for this change.
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
See test/detection/test_*.py
Any specific deployment considerations
For example, documentation changes, usability, usage/costs, secrets, etc.
Docs