Skip to content

Conversation

@MhhhxX
Copy link

@MhhhxX MhhhxX commented Oct 21, 2025

Motivation and context

Searching for relevant frames to annotate in freshly uploaded and especially long videos can be very time consuming without any kind of hint.
To give a help for the worker who annotates the video this PR uses the chapter marks stored inside the metadata of
some video container formats. Chapter marks are shown as clickable ticks underneath the player slider and there are new player buttons to jump the previous/next chapter in the video. A worker can use these navigation features on newly uploaded videos and (immediately) start annotating on relevant frames.
Example:

  1. Record the video you want to annotate
  2. While recording write chapter marks in the video file (for example press a button on the recording device when you see a car you later want to annotate)
  3. Upload the video to cvat and jump to the chapter marks and start annotating
cvat chapters cvat chapter tooltip

How has this been tested?

I extended the rest api test case for TaskMetaData

I tested the ui myself by testing the new buttons and marks in the browser

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

for chapter in chapters:
chapter["start"] = round(
float(chapter["start"]) / float(chapter["time_base"].denominator) * stream.frames /
(stream.duration * stream.time_base)
Copy link
Contributor

@zhiltsov-max zhiltsov-max Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There are several big issues with mapping times to frames precisely. The reported values for stream/container .frames, .duration can be missing or guessed (possibly, incorrectly), if taken from the metadata without full decoding. One of the examples is videos with VBR. This is the reason why CVAT extracts duration optionally by decoding all the frames. Basically, a correct way would be to collect pts for each frame and then find the closest frame to the requested position. What can also work is using seek(). Maybe the invalid chapters can be ignored if they can't be read, but it has to be done during task creation.

Some references:

  1. I think this information has to be extracted and stored separately during task creation (e.g. in a manifest file of DB) for faster access and without always depending on the original video file. It doesn't look good that this is parsed from the original video on every request.

Copy link
Author

@MhhhxX MhhhxX Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my latest two commits I extended the manifest file with chapter information and I updated the rest api accordingly.

And thanks a lot for the insights and sources around that topic!

I still didn't implement any validation code for chapters. My ideas are checking if start is smaller or equal than end and if chapters are overlapping. Do you have any additions or corrections on that? Also I'm not sure how to handle overlapping chapters. Maybe splitting or merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good overall, let's polish them a little bit. Regarding validation, I think we don't need to delve too deep here. I'd start with ignoring chapters with the boundaries outside the video range and maybe with limiting the maximum number of chapters to some reasonable value. I don't really see big issues with overlaps here, it could possibly be useful.

@zhiltsov-max zhiltsov-max requested a review from klakhov October 21, 2025 12:28
@klakhov
Copy link
Contributor

klakhov commented Oct 22, 2025

It seems chapter marker can overlap with frame name:
image

@MhhhxX
Copy link
Author

MhhhxX commented Oct 23, 2025

It seems chapter marker can overlap with frame name: image

@klakhov I could show the ticks above the slider like this:
Bildschirmfoto am 2025-10-23 um 10 35 51
What do you think about this?

@klakhov
Copy link
Contributor

klakhov commented Oct 23, 2025

@MhhhxX, Yes, I think it would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add type annotations in the new or updated function signatures, where possible.

Copy link
Author

@MhhhxX MhhhxX Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added type annotations for the new function in this commit.

Comment on lines 2977 to 2979
start = serializers.IntegerField()
end = serializers.IntegerField()
time_base = FractionSerializer(many=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to provide time information in the API? I think it could possibly be in the chapter meta, if needed, but the start and end should be just the frame numbers.

class ChapterSerializer(serializers.Serializer):
id = serializers.IntegerField()
start = serializers.IntegerField()
end = serializers.IntegerField()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically use the stop value (the last included value of the range) instead end (the first after the last or None) for frame range specification in CVAT. I think it would make sense to use it here as well for consistency.

id = serializers.IntegerField()
start = serializers.IntegerField()
end = serializers.IntegerField()
time_base = FractionSerializer(many=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time_base = FractionSerializer(many=False)
time_base = FractionSerializer()

from collections import Counter
from collections.abc import Iterable, Sequence
from copy import deepcopy
from fractions import Fraction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from fractions import Fraction

from itertools import islice
from json.decoder import JSONDecodeError
from typing import Any, Callable, Optional, Union
from typing import Any, Callable, Optional, Union, List, Tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from typing import Any, Callable, Optional, Union, List, Tuple
from typing import Any, Callable, Optional, Union

Deprecated in 3.9
https://docs.python.org/3.10/library/typing.html#typing.List

def _find_closest_pts(pts_list, target_pts):
if not pts_list:
return None
return min(range(len(pts_list)), key=lambda i: abs(pts_list[i] - target_pts))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the bisect module instead, it's binary search.


@staticmethod
def _get_chapters(container):
chapters = container.chapters()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider copying just what's actually needed to protect the code from possible API changes in the function output.

Copy link
Author

@MhhhxX MhhhxX Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the time base field in the returned chapters. I would like to keep the id field because it's maybe useful to show in your suggested chapter list for the UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a chapter list with chapter names and a selector in UI? It looks like it could also be useful, and it would put the chapter names reported from the API to the actual use.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great idea, I think. I will work on that!

@MhhhxX
Copy link
Author

MhhhxX commented Oct 24, 2025

@MhhhxX, Yes, I think it would be better.

Markers are now above the slider.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants