Marks and navigation buttons for chapters of video files in the player navigation #9924

MhhhxX · 2025-10-21T08:56:34Z

Motivation and context

Searching for relevant frames to annotate in freshly uploaded and especially long videos can be very time consuming without any kind of hint.
To give a help for the worker who annotates the video this PR uses the chapter marks stored inside the metadata of
some video container formats. Chapter marks are shown as clickable ticks underneath the player slider and there are new player buttons to jump the previous/next chapter in the video. A worker can use these navigation features on newly uploaded videos and (immediately) start annotating on relevant frames.
Example:

Record the video you want to annotate
While recording write chapter marks in the video file (for example press a button on the recording device when you see a car you later want to annotate)
Upload the video to cvat and jump to the chapter marks and start annotating

How has this been tested?

I extended the rest api test case for TaskMetaData

I tested the ui myself by testing the new buttons and marks in the browser

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (see GitHub docs)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

… task metadata rest api endpoint. Extends FramesMetaData class with video chapter information in the frontend.

* Seek to next/previous chapter mark

zhiltsov-max · 2025-10-21T11:01:54Z

cvat/apps/engine/media_extractors.py

+    for chapter in chapters:
+        chapter["start"] = round(
+            float(chapter["start"]) / float(chapter["time_base"].denominator) * stream.frames /
+            (stream.duration * stream.time_base)


There are several big issues with mapping times to frames precisely. The reported values for stream/container .frames, .duration can be missing or guessed (possibly, incorrectly), if taken from the metadata without full decoding. One of the examples is videos with VBR. This is the reason why CVAT extracts duration optionally by decoding all the frames. Basically, a correct way would be to collect pts for each frame and then find the closest frame to the requested position. What can also work is using seek(). Maybe the invalid chapters can be ignored if they can't be read, but it has to be done during task creation.

Some references:

duration, nb_frames https://ffmpeg.org/doxygen/trunk/structAVStream.html#a4e04af7a5a4d8298649850df798dd0bc

I think this information has to be extracted and stored separately during task creation (e.g. in a manifest file of DB) for faster access and without always depending on the original video file. It doesn't look good that this is parsed from the original video on every request.

In my latest two commits I extended the manifest file with chapter information and I updated the rest api accordingly.

And thanks a lot for the insights and sources around that topic!

I still didn't implement any validation code for chapters. My ideas are checking if start is smaller or equal than end and if chapters are overlapping. Do you have any additions or corrections on that? Also I'm not sure how to handle overlapping chapters. Maybe splitting or merging.

The changes look good overall, let's polish them a little bit. Regarding validation, I think we don't need to delve too deep here. I'd start with ignoring chapters with the boundaries outside the video range and maybe with limiting the maximum number of chapters to some reasonable value. I don't really see big issues with overlaps here, it could possibly be useful.

cvat-core/src/frames.ts

cvat-ui/src/components/annotation-page/top-bar/top-bar.tsx

klakhov · 2025-10-22T12:14:51Z

It seems chapter marker can overlap with frame name:

MhhhxX · 2025-10-23T08:37:54Z

It seems chapter marker can overlap with frame name:

@klakhov I could show the ticks above the slider like this:

What do you think about this?

klakhov · 2025-10-23T10:57:42Z

@MhhhxX, Yes, I think it would be better.

… the marks to the front.

zhiltsov-max · 2025-10-23T12:28:52Z

cvat/apps/engine/media_extractors.py

Please add type annotations in the new or updated function signatures, where possible.

I added type annotations for the new function in this commit.

zhiltsov-max · 2025-10-23T12:32:38Z

cvat/apps/engine/serializers.py

+    start = serializers.IntegerField()
+    end = serializers.IntegerField()
+    time_base = FractionSerializer(many=False)


Do we actually need to provide time information in the API? I think it could possibly be in the chapter meta, if needed, but the start and end should be just the frame numbers.

zhiltsov-max · 2025-10-23T12:34:37Z

cvat/apps/engine/serializers.py

+class ChapterSerializer(serializers.Serializer):
+    id = serializers.IntegerField()
+    start = serializers.IntegerField()
+    end = serializers.IntegerField()


We typically use the stop value (the last included value of the range) instead end (the first after the last or None) for frame range specification in CVAT. I think it would make sense to use it here as well for consistency.

zhiltsov-max · 2025-10-23T12:35:20Z

cvat/apps/engine/serializers.py

+    id = serializers.IntegerField()
+    start = serializers.IntegerField()
+    end = serializers.IntegerField()
+    time_base = FractionSerializer(many=False)


Suggested change

time_base = FractionSerializer(many=False)

time_base = FractionSerializer()

zhiltsov-max · 2025-10-23T12:37:46Z

tests/python/rest_api/test_task_data.py

 from collections import Counter
 from collections.abc import Iterable, Sequence
 from copy import deepcopy
+from fractions import Fraction


Suggested change

from fractions import Fraction

zhiltsov-max · 2025-10-23T12:40:00Z

utils/dataset_manifest/core.py

 from itertools import islice
 from json.decoder import JSONDecodeError
-from typing import Any, Callable, Optional, Union
+from typing import Any, Callable, Optional, Union, List, Tuple


Suggested change

from typing import Any, Callable, Optional, Union, List, Tuple

from typing import Any, Callable, Optional, Union

Deprecated in 3.9
https://docs.python.org/3.10/library/typing.html#typing.List

zhiltsov-max · 2025-10-23T12:41:57Z

utils/dataset_manifest/core.py

+    def _find_closest_pts(pts_list, target_pts):
+        if not pts_list:
+            return None
+        return min(range(len(pts_list)), key=lambda i: abs(pts_list[i] - target_pts))


Consider using the bisect module instead, it's binary search.

utils/dataset_manifest/core.py

zhiltsov-max · 2025-10-23T12:49:38Z

utils/dataset_manifest/core.py


+    @staticmethod
+    def _get_chapters(container):
+        chapters = container.chapters()


Consider copying just what's actually needed to protect the code from possible API changes in the function output.

I removed the time base field in the returned chapters. I would like to keep the id field because it's maybe useful to show in your suggested chapter list for the UI.

zhiltsov-max · 2025-10-23T12:57:12Z

cvat-core/src/frames.ts

What do you think about adding a chapter list with chapter names and a selector in UI? It looks like it could also be useful, and it would put the chapter names reported from the API to the actual use.

It's a great idea, I think. I will work on that!

…ded will never be. Renamed end field of Chapter classes to stop to be consistent with the range spec of the CVAT project.

Co-authored-by: Maxim Zhiltsov <[email protected]>

…i. Update api schema.

MhhhxX · 2025-10-24T08:28:05Z

@MhhhxX, Yes, I think it would be better.

Markers are now above the slider.

sonarqubecloud · 2025-10-24T08:46:40Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

DE-AI added 5 commits October 21, 2025 09:34

Serializers for video chapters and integration of chapters in job and…

e1b39eb

… task metadata rest api endpoint. Extends FramesMetaData class with video chapter information in the frontend.

* Visualize chapter marks in the player slider component

f9def16

* Seek to next/previous chapter mark

Use the start timestamp to check if the chapter is inside the segment

fe46b06

Update openapi schema with chapter class and field.

844c18e

Rest API test for chapters

cfc34f9

MhhhxX requested review from SpecLad, bsekachev and zhiltsov-max as code owners October 21, 2025 08:56

Undo browser change for cypress

266852b

zhiltsov-max reviewed Oct 21, 2025

View reviewed changes

zhiltsov-max requested a review from klakhov October 21, 2025 12:28

klakhov reviewed Oct 22, 2025

View reviewed changes

cvat-core/src/frames.ts Show resolved Hide resolved

cvat-core/src/frames.ts Outdated Show resolved Hide resolved

cvat-core/src/frames.ts Outdated Show resolved Hide resolved

cvat-ui/src/components/annotation-page/top-bar/top-bar.tsx Outdated Show resolved Hide resolved

DE-AI added 3 commits October 22, 2025 16:37

Extend manifest file with chapters

fef0935

Use manifest file as source for video chapters in the rest api.

96997f9

Handled Typescript code reviews.

beb51ce

MhhhxX added 2 commits October 23, 2025 13:09

Move chapter marks above the player slider and bring the step dots of…

b1fcd5e

… the marks to the front.

Fix more code style issues.

661bfaf

zhiltsov-max requested changes Oct 23, 2025

View reviewed changes

zhiltsov-max reviewed Oct 23, 2025

View reviewed changes

MhhhxX and others added 7 commits October 23, 2025 15:24

Type information for get_video_chapaters method

4b08264

Remove time_base field from the chapter api response as it is not nee…

9400f93

…ded will never be. Renamed end field of Chapter classes to stop to be consistent with the range spec of the CVAT project.

Ensure backwards compatibility of new "chapters" field.

825b909

Co-authored-by: Maxim Zhiltsov <[email protected]>

Remove and replace deprecated imports from the typing module.

5b57a3b

Remove unused import

5487ce7

Refactor find closest pts method with bisect module.

2175684

Don't copy time base field from chapters as it isn't needed in the ap…

ae64ff7

…i. Update api schema.

Remove Fraction from the api and frontend code.

867ae0a

	time_base = FractionSerializer(many=False)
	time_base = FractionSerializer()

	from typing import Any, Callable, Optional, Union, List, Tuple
	from typing import Any, Callable, Optional, Union

Marks and navigation buttons for chapters of video files in the player navigation #9924

Are you sure you want to change the base?

Marks and navigation buttons for chapters of video files in the player navigation #9924

Uh oh!

Conversation

MhhhxX commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and context

How has this been tested?

Checklist

License

Uh oh!

zhiltsov-max Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MhhhxX Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

klakhov commented Oct 22, 2025

Uh oh!

MhhhxX commented Oct 23, 2025

Uh oh!

klakhov commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MhhhxX Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MhhhxX Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MhhhxX commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 24, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MhhhxX commented Oct 21, 2025 •

edited

Loading

zhiltsov-max Oct 21, 2025 •

edited

Loading

MhhhxX Oct 22, 2025 •

edited

Loading

MhhhxX Oct 24, 2025 •

edited

Loading

MhhhxX Oct 24, 2025 •

edited

Loading

MhhhxX commented Oct 24, 2025 •

edited

Loading