-
Notifications
You must be signed in to change notification settings - Fork 67
Decoder-native resize public implementation #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
dd24dfa
3a2df84
5344ab4
98cf81b
65c4ad7
f300c70
2c3b7f0
80e84b5
5ac60d8
531b40f
cc333ac
238a8ff
55d362c
0d2492e
a2da767
2cd3f65
4ff0186
0f9eb62
8081298
39ed9ac
6e6815c
363e688
463674d
c20914c
254641a
9b4186a
105c77f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,17 @@ | ||||||||||||||||
| .. _samplers: | ||||||||||||||||
|
|
||||||||||||||||
| =================== | ||||||||||||||||
| torchcodec.transforms | ||||||||||||||||
| =================== | ||||||||||||||||
|
Comment on lines
+3
to
+5
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're getting warnings when the length don't match
Suggested change
|
||||||||||||||||
|
|
||||||||||||||||
| .. currentmodule:: torchcodec.transforms | ||||||||||||||||
|
|
||||||||||||||||
| For a tutorial, see: TODO_DECODER_TRANSFORMS_TUTORIAL. | ||||||||||||||||
|
|
||||||||||||||||
| .. autosummary:: | ||||||||||||||||
| :toctree: generated/ | ||||||||||||||||
| :nosignatures: | ||||||||||||||||
| :template: dataclass.rst | ||||||||||||||||
|
|
||||||||||||||||
| DecoderTransform | ||||||||||||||||
| Resize | ||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -125,3 +125,4 @@ Encoding | |
| api_ref_decoders | ||
| api_ref_encoders | ||
| api_ref_samplers | ||
| api_ref_transforms | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,3 +4,4 @@ files = src/torchcodec | |
| show_error_codes = True | ||
| pretty = True | ||
| allow_redefinition = True | ||
| follow_untyped_imports = True | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was getting linting errors like: https://github.com/meta-pytorch/torchcodec/actions/runs/19157614790/job/54761644331 Which points to docs which recommend the above change: https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports |
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -8,17 +8,18 @@ | |||||||||||||||
| import json | ||||||||||||||||
| import numbers | ||||||||||||||||
| from pathlib import Path | ||||||||||||||||
| from typing import Literal, Optional, Tuple, Union | ||||||||||||||||
| from typing import List, Literal, Optional, Sequence, Tuple, Union | ||||||||||||||||
|
|
||||||||||||||||
| import torch | ||||||||||||||||
| from torch import device as torch_device, Tensor | ||||||||||||||||
| from torch import device as torch_device, nn, Tensor | ||||||||||||||||
|
|
||||||||||||||||
| from torchcodec import _core as core, Frame, FrameBatch | ||||||||||||||||
| from torchcodec.decoders._decoder_utils import ( | ||||||||||||||||
| _get_cuda_backend, | ||||||||||||||||
| create_decoder, | ||||||||||||||||
| ERROR_REPORTING_INSTRUCTIONS, | ||||||||||||||||
| ) | ||||||||||||||||
| from torchcodec.transforms import DecoderTransform, Resize | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| class VideoDecoder: | ||||||||||||||||
|
|
@@ -66,6 +67,13 @@ class VideoDecoder: | |||||||||||||||
| probably is. Default: "exact". | ||||||||||||||||
| Read more about this parameter in: | ||||||||||||||||
| :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` | ||||||||||||||||
| transforms (sequence of transform objects, optional): Sequence of transforms to be | ||||||||||||||||
| applied to the decoded frames by the decoder itself, in order. Accepts both | ||||||||||||||||
| :class:`~torchcodec.transforms.DecoderTransform` and | ||||||||||||||||
| `torchvision.transforms.v2.Transform <https://docs.pytorch.org/vision/stable/transforms.html#v2-api-reference-recommended>`_ | ||||||||||||||||
| objects. All transforms are applied | ||||||||||||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This and all other references to TorchVision transforms use hard links. I don't think we can get proper Sphinx references when it's in a different project.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can, we'll just need to add a torchvision entry here: torchcodec/docs/source/conf.py Lines 209 to 215 in b35005d
Feel free to leave that as follow-up / open an issue. |
||||||||||||||||
| in the ouput pixel format and colorspace. Read more about this parameter in: | ||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following up on #1003 (comment)
Agreed, we should document and claim that TV is our ref. I think we have slightly different understandings of what we mean by "TV is our ref", your definition is slightly stricter than mine (see below).
I don't think it adds a lot of value to document, as I don't know if that's a questions users are even asking themselves. But I could be wrong and I don't feel strongly about it. What I'm slightly more concerned about the comment is that it seems like a contract, and I suspect we may want to relax that behavior in the future. E.g. for crop, we might want to apply it in YUV space instead of RGB if it's faster and if models can't notice the difference. To me, when we say "TV is our ref", it means "this transforms has the same behavior as the TV transform as far as models are concerned". It's not strictly about bitwise equality (we'll never have that). It's only about whether the models can tell the difference. We know they can tell the difference for resize's interpolation mode. But if they can't tell the difference for (e.g.) crop being applied before or after color-conversion, I think we could allow ourselves to make that change of behavior. That allows us more freedom to potentially enable higher perf gains in the future. None of my comments above are blocking. We can go ahead as-is. I'm happy that for once, I am not the one insisting on strictness :D |
||||||||||||||||
| TODO_DECODER_TRANSFORMS_TUTORIAL. | ||||||||||||||||
| custom_frame_mappings (str, bytes, or file-like object, optional): | ||||||||||||||||
| Mapping of frames to their metadata, typically generated via ffprobe. | ||||||||||||||||
| This enables accurate frame seeking without requiring a full video scan. | ||||||||||||||||
|
|
@@ -104,6 +112,7 @@ def __init__( | |||||||||||||||
| num_ffmpeg_threads: int = 1, | ||||||||||||||||
| device: Optional[Union[str, torch_device]] = "cpu", | ||||||||||||||||
| seek_mode: Literal["exact", "approximate"] = "exact", | ||||||||||||||||
| transforms: Optional[Sequence[Union[DecoderTransform, nn.Module]]] = None, | ||||||||||||||||
| custom_frame_mappings: Optional[ | ||||||||||||||||
| Union[str, bytes, io.RawIOBase, io.BufferedReader] | ||||||||||||||||
| ] = None, | ||||||||||||||||
|
|
@@ -148,13 +157,16 @@ def __init__( | |||||||||||||||
|
|
||||||||||||||||
| device_variant = _get_cuda_backend() | ||||||||||||||||
|
|
||||||||||||||||
| transform_specs = _make_transform_specs(transforms) | ||||||||||||||||
|
|
||||||||||||||||
| core.add_video_stream( | ||||||||||||||||
| self._decoder, | ||||||||||||||||
| stream_index=stream_index, | ||||||||||||||||
| dimension_order=dimension_order, | ||||||||||||||||
| num_threads=num_ffmpeg_threads, | ||||||||||||||||
| device=device, | ||||||||||||||||
| device_variant=device_variant, | ||||||||||||||||
| transform_specs=transform_specs, | ||||||||||||||||
| custom_frame_mappings=custom_frame_mappings_data, | ||||||||||||||||
| ) | ||||||||||||||||
|
|
||||||||||||||||
|
|
@@ -432,6 +444,78 @@ def _get_and_validate_stream_metadata( | |||||||||||||||
| ) | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| def _convert_to_decoder_transforms( | ||||||||||||||||
| transforms: Sequence[Union[DecoderTransform, nn.Module]], | ||||||||||||||||
| ) -> List[DecoderTransform]: | ||||||||||||||||
| """Convert a sequence of transforms that may contain TorchVision transform | ||||||||||||||||
| objects into a list of only TorchCodec transform objects. | ||||||||||||||||
|
|
||||||||||||||||
| Args: | ||||||||||||||||
| transforms: Squence of transform objects. The objects can be one of two | ||||||||||||||||
| types: | ||||||||||||||||
| 1. torchcodec.transforms.DecoderTransform | ||||||||||||||||
| 2. torchvision.transforms.v2.Transform, but our type annotation | ||||||||||||||||
| only mentions its base, nn.Module. We don't want to take a | ||||||||||||||||
| hard dependency on TorchVision. | ||||||||||||||||
|
|
||||||||||||||||
| Returns: | ||||||||||||||||
| List of DecoderTransform objects. | ||||||||||||||||
| """ | ||||||||||||||||
| try: | ||||||||||||||||
| from torchvision.transforms import v2 | ||||||||||||||||
|
|
||||||||||||||||
| tv_available = True | ||||||||||||||||
| except ImportError: | ||||||||||||||||
| tv_available = False | ||||||||||||||||
|
|
||||||||||||||||
| converted_transforms: list[DecoderTransform] = [] | ||||||||||||||||
| for transform in transforms: | ||||||||||||||||
| if not isinstance(transform, DecoderTransform): | ||||||||||||||||
| if not tv_available: | ||||||||||||||||
| raise ValueError( | ||||||||||||||||
| f"The supplied transform, {transform}, is not a TorchCodec " | ||||||||||||||||
| " DecoderTransform. TorchCodec also accept TorchVision " | ||||||||||||||||
| "v2 transforms, but TorchVision is not installed." | ||||||||||||||||
| ) | ||||||||||||||||
| if isinstance(transform, v2.Resize): | ||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this fails if EDIT ah no that's probably fine because of the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes me thing we should have a dummy job where we don't install TV that ensures TC still works fine...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On a job which doesn't have TorchVision installed: I agree we need to do something here, but I'd like to punt on this for now. The current testing file imports TorchVision unconditionally. I think we'll want to separate out the tests that require TorchVision from those that don't so that we can test both behaviors, but that will require different I actually started to add a step in the current linux wheel test that did not install TorchVision when I realized this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we can punt on this. I'm hoping we can do something very simple regarding testing: keep all but one test job using torchvision, and just have one small CI job that doesn't install TV and just runs a few tests, basically just insuring TV is an optional dependency. I'd like to avoid separating tests in different files just for that - we may have more than one optional dependency and that quickly becomes untractable.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit, I think I would have been less surprised by
Suggested change
|
||||||||||||||||
| converted_transforms.append(Resize._from_torchvision(transform)) | ||||||||||||||||
| else: | ||||||||||||||||
| raise ValueError( | ||||||||||||||||
| f"Unsupported transform: {transform}. Transforms must be " | ||||||||||||||||
| "either a TorchCodec DecoderTransform or a TorchVision " | ||||||||||||||||
| "v2 transform." | ||||||||||||||||
| ) | ||||||||||||||||
| else: | ||||||||||||||||
| converted_transforms.append(transform) | ||||||||||||||||
|
|
||||||||||||||||
| return converted_transforms | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| def _make_transform_specs( | ||||||||||||||||
| transforms: Optional[Sequence[Union[DecoderTransform, nn.Module]]], | ||||||||||||||||
| ) -> str: | ||||||||||||||||
| """Given a sequence of transforms, turn those into the specification string | ||||||||||||||||
| the core API expects. | ||||||||||||||||
|
|
||||||||||||||||
| Args: | ||||||||||||||||
| transforms: Optional sequence of transform objects. The objects can be | ||||||||||||||||
| one of two types: | ||||||||||||||||
| 1. torchcodec.transforms.DecoderTransform | ||||||||||||||||
| 2. torchvision.transforms.v2.Transform, but our type annotation | ||||||||||||||||
| only mentions its base, nn.Module. We don't want to take a | ||||||||||||||||
| hard dependency on TorchVision. | ||||||||||||||||
|
|
||||||||||||||||
| Returns: | ||||||||||||||||
| String of transforms in the format the core API expects: transform | ||||||||||||||||
| specifications separate by semicolons. | ||||||||||||||||
| """ | ||||||||||||||||
| if transforms is None: | ||||||||||||||||
| return "" | ||||||||||||||||
|
|
||||||||||||||||
| transforms = _convert_to_decoder_transforms(transforms) | ||||||||||||||||
| return ";".join([t._make_params() for t in transforms]) | ||||||||||||||||
|
|
||||||||||||||||
scotts marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
|
|
||||||||||||||||
| def _read_custom_frame_mappings( | ||||||||||||||||
| custom_frame_mappings: Union[str, bytes, io.RawIOBase, io.BufferedReader] | ||||||||||||||||
| ) -> tuple[Tensor, Tensor, Tensor]: | ||||||||||||||||
|
|
||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||
| # All rights reserved. | ||
| # | ||
| # This source code is licensed under the BSD-style license found in the | ||
| # LICENSE file in the root directory of this source tree. | ||
|
|
||
| from ._decoder_transforms import DecoderTransform, Resize # noqa |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,85 @@ | ||||||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||||||
| # All rights reserved. | ||||||
| # | ||||||
| # This source code is licensed under the BSD-style license found in the | ||||||
| # LICENSE file in the root directory of this source tree. | ||||||
|
|
||||||
| from abc import ABC, abstractmethod | ||||||
| from dataclasses import dataclass | ||||||
| from typing import Sequence | ||||||
|
|
||||||
| from torch import nn | ||||||
|
|
||||||
|
|
||||||
| @dataclass | ||||||
| class DecoderTransform(ABC): | ||||||
| """Base class for all decoder transforms. | ||||||
| A *decoder transform* is a transform that is applied by the decoder before | ||||||
| returning the decoded frame. Applying decoder transforms to frames | ||||||
| should be both faster and more memory efficient than receiving normally | ||||||
| decoded frames and applying the same kind of transform. | ||||||
| Most `DecoderTransform` objects have a complementary transform in TorchVision, | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Annoyingly, single backticks in
Suggested change
|
||||||
| specificially in | ||||||
| `torchvision.transforms.v2 <https://docs.pytorch.org/vision/stable/transforms.html#v2-api-reference-recommended>`_. | ||||||
| For such transforms, we ensure that: | ||||||
| 1. The names are the same. | ||||||
| 2. Default behaviors are the same. | ||||||
| 3. The parameters for the `DecoderTransform` object are a subset of the | ||||||
| TorchVision transform object. | ||||||
| 4. Parameters with the same name control the same behavior and accept a | ||||||
| subset of the same types. | ||||||
| 5. The difference between the frames returned by a decoder transform and | ||||||
| the complementary TorchVision transform are small. | ||||||
| All decoder transforms are applied in the output pixel format and colorspace. | ||||||
| """ | ||||||
|
|
||||||
| @abstractmethod | ||||||
| def _make_params(self) -> str: | ||||||
| pass | ||||||
|
|
||||||
|
|
||||||
| @dataclass | ||||||
| class Resize(DecoderTransform): | ||||||
scotts marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| """Resize the decoded frame to a given size. | ||||||
| Complementary TorchVision transform: | ||||||
| `torchvision.transforms.v2.Resize <https://docs.pytorch.org/vision/stable/generated/torchvision.transforms.v2.Resize.html#torchvision.transforms.v2.Resize>`_. | ||||||
| Interpolation is always bilinear. Anti-aliasing is always on. | ||||||
| Args: | ||||||
| size: (sequence of int): Desired output size. Must be a sequence of | ||||||
| the form (height, width). | ||||||
| """ | ||||||
|
|
||||||
| size: Sequence[int] | ||||||
|
|
||||||
| def _make_params(self) -> str: | ||||||
| assert len(self.size) == 2 | ||||||
| return f"resize, {self.size[0]}, {self.size[1]}" | ||||||
|
Comment on lines
+60
to
+62
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we call it something else than
|
||||||
|
|
||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this class method below is new. Because I'm trying to exhaustively catch all of the Also, to state it explicit, |
||||||
| @classmethod | ||||||
| def _from_torchvision(cls, resize_tv: nn.Module): | ||||||
| from torchvision.transforms import v2 | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest the following: try:
from torchvision.transforms import v2
except ImportError from e:
raise RuntimeError("Couldn't find TorchVision - this should never happen, please report a bug") from eThis should probably be in a helper function, reused across classes. My goal here is mainly to help the reader (us, the devs) understand that this code-path is only expected to be run when TV is already available. Otherwise, the plain import makes it look like |
||||||
|
|
||||||
| assert isinstance(resize_tv, v2.Resize) | ||||||
|
|
||||||
| if resize_tv.interpolation is not v2.InterpolationMode.BILINEAR: | ||||||
| raise ValueError( | ||||||
| "TorchVision Resize transform must use bilinear interpolation." | ||||||
| ) | ||||||
| if resize_tv.antialias is False: | ||||||
| raise ValueError( | ||||||
| "TorchVision Resize transform must have antialias enabled." | ||||||
| ) | ||||||
| if resize_tv.size is None: | ||||||
| raise ValueError("TorchVision Resize transform must have a size specified.") | ||||||
| if len(resize_tv.size) != 2: | ||||||
| raise ValueError( | ||||||
| "TorchVision Resize transform must have a (height, width) " | ||||||
| f"pair for the size, got {resize_tv.size}." | ||||||
| ) | ||||||
| return cls(size=resize_tv.size) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.