Changes since v0.8.0:
CUDA 12.x compatibility
- #711 : Added preliminary information regarding Blackwell cards and micro-architecture
*#701 : The--version-ident
compilation option to NVRTC was dropped in CUDA 12.2; this is now respected by the wrappers and the option is not exposed for 12.2 and newer versions of CUDA. - #702 : Fixed handling of
--version-ident
(we had a spacing issue) - #635, #701 : Added support for the
--fdevice_syntax_only
and--minimal
options for NVRTC compilation
Changes to the unique_span
& unique_region
classes
- #703 :
unique_span<T>::swap()
now correctly swaps the deleters as well - #713 : Move constructor and assignment operator of
unique_region_t
- #702 : Fixed a typo when passing the
--no-source-include
option to NVRTC - #719: Removed redundant cast operations from
unique_span<T>
Bug fixes
- #706 : Made
context_t::flags()
non-virtual - #710 : Fixed the comparison operators for launch configurations
- #709 : Span-to-C-array copy no longer ignoring the designated stream
- #708 : Avoiding infinite recursion in
link_t::add_file()
Build & installation
- #717: Creating possibly-missing CUDAToolkit targets in installed config files, so that library targets can rely on them:
nvfatbin
,nvfatbin_static
andcufilt
.
Other changes
- #704 : Limited the clang warning flags (no
-pedantic
) to avoid warnings we can't resolve - #705 : Made some methods of
library_t
beconst
- #721 :
device::proprties_t::max_in_flight_threads_on_device()
now returns anunsigned
(rather thanunsigned long long
)
Example programs
- #720 : Avoiding suspicious numeric conversions in the example programs (mostly inherited from NVIDIA, tsk tsk tsk)
- #722: In simpleCudaGraphs, when using stream capture, now enqueueing the correct, existing event rather than an anonymous transient event
- Now compiling the example programs with more warning flags on.