-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Marks and navigation buttons for chapters of video files in the player navigation #9924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
… task metadata rest api endpoint. Extends FramesMetaData class with video chapter information in the frontend.
* Seek to next/previous chapter mark
cvat/apps/engine/media_extractors.py
Outdated
| for chapter in chapters: | ||
| chapter["start"] = round( | ||
| float(chapter["start"]) / float(chapter["time_base"].denominator) * stream.frames / | ||
| (stream.duration * stream.time_base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There are several big issues with mapping times to frames precisely. The reported values for stream/container
.frames,.durationcan be missing or guessed (possibly, incorrectly), if taken from the metadata without full decoding. One of the examples is videos with VBR. This is the reason why CVAT extracts duration optionally by decoding all the frames. Basically, a correct way would be to collectptsfor each frame and then find the closest frame to the requested position. What can also work is using seek(). Maybe the invalid chapters can be ignored if they can't be read, but it has to be done during task creation.
Some references:
duration,nb_frameshttps://ffmpeg.org/doxygen/trunk/structAVStream.html#a4e04af7a5a4d8298649850df798dd0bc
- I think this information has to be extracted and stored separately during task creation (e.g. in a manifest file of DB) for faster access and without always depending on the original video file. It doesn't look good that this is parsed from the original video on every request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my latest two commits I extended the manifest file with chapter information and I updated the rest api accordingly.
And thanks a lot for the insights and sources around that topic!
I still didn't implement any validation code for chapters. My ideas are checking if start is smaller or equal than end and if chapters are overlapping. Do you have any additions or corrections on that? Also I'm not sure how to handle overlapping chapters. Maybe splitting or merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good overall, let's polish them a little bit. Regarding validation, I think we don't need to delve too deep here. I'd start with ignoring chapters with the boundaries outside the video range and maybe with limiting the maximum number of chapters to some reasonable value. I don't really see big issues with overlaps here, it could possibly be useful.
|
@klakhov I could show the ticks above the slider like this: |
|
@MhhhxX, Yes, I think it would be better. |
… the marks to the front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add type annotations in the new or updated function signatures, where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added type annotations for the new function in this commit.
cvat/apps/engine/serializers.py
Outdated
| start = serializers.IntegerField() | ||
| end = serializers.IntegerField() | ||
| time_base = FractionSerializer(many=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to provide time information in the API? I think it could possibly be in the chapter meta, if needed, but the start and end should be just the frame numbers.
cvat/apps/engine/serializers.py
Outdated
| class ChapterSerializer(serializers.Serializer): | ||
| id = serializers.IntegerField() | ||
| start = serializers.IntegerField() | ||
| end = serializers.IntegerField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically use the stop value (the last included value of the range) instead end (the first after the last or None) for frame range specification in CVAT. I think it would make sense to use it here as well for consistency.
cvat/apps/engine/serializers.py
Outdated
| id = serializers.IntegerField() | ||
| start = serializers.IntegerField() | ||
| end = serializers.IntegerField() | ||
| time_base = FractionSerializer(many=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| time_base = FractionSerializer(many=False) | |
| time_base = FractionSerializer() |
| from collections import Counter | ||
| from collections.abc import Iterable, Sequence | ||
| from copy import deepcopy | ||
| from fractions import Fraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from fractions import Fraction |
utils/dataset_manifest/core.py
Outdated
| from itertools import islice | ||
| from json.decoder import JSONDecodeError | ||
| from typing import Any, Callable, Optional, Union | ||
| from typing import Any, Callable, Optional, Union, List, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from typing import Any, Callable, Optional, Union, List, Tuple | |
| from typing import Any, Callable, Optional, Union |
Deprecated in 3.9
https://docs.python.org/3.10/library/typing.html#typing.List
utils/dataset_manifest/core.py
Outdated
| def _find_closest_pts(pts_list, target_pts): | ||
| if not pts_list: | ||
| return None | ||
| return min(range(len(pts_list)), key=lambda i: abs(pts_list[i] - target_pts)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the bisect module instead, it's binary search.
|
|
||
| @staticmethod | ||
| def _get_chapters(container): | ||
| chapters = container.chapters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider copying just what's actually needed to protect the code from possible API changes in the function output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the time base field in the returned chapters. I would like to keep the id field because it's maybe useful to show in your suggested chapter list for the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a chapter list with chapter names and a selector in UI? It looks like it could also be useful, and it would put the chapter names reported from the API to the actual use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a great idea, I think. I will work on that!
…ded will never be. Renamed end field of Chapter classes to stop to be consistent with the range spec of the CVAT project.
Co-authored-by: Maxim Zhiltsov <[email protected]>
…i. Update api schema.
Markers are now above the slider. |
|





Motivation and context
Searching for relevant frames to annotate in freshly uploaded and especially long videos can be very time consuming without any kind of hint.
To give a help for the worker who annotates the video this PR uses the chapter marks stored inside the metadata of
some video container formats. Chapter marks are shown as clickable ticks underneath the player slider and there are new player buttons to jump the previous/next chapter in the video. A worker can use these navigation features on newly uploaded videos and (immediately) start annotating on relevant frames.
Example:
How has this been tested?
I extended the rest api test case for TaskMetaData
I tested the ui myself by testing the new buttons and marks in the browser
Checklist
developbranchLicense
Feel free to contact the maintainers if that's a concern.