Releases: bitfaster/BitFaster.Caching
Releases · bitfaster/BitFaster.Caching
v2.5.4
What's changed
- Eagerly purge deleted items from internal
ConcurrentLruqueues. Previously, deleted items remain in the internal queues until fully cycled out of cold. Instead, purge them as items transition from queue to queue (e.g. from hot to warm) as part of cycle. - Fix
UnobservedTaskExceptionfor value creation when usingAsAsyncCache()/AsyncAtomicFactory/ScopedAsyncAtomicFactory. If there is an exception when invoking the value factory delegate, the internalTaskCompletionSourcehas an exception set that previously was not observed unless another thread concurrently evaluated the result. Fixed by @advdotnet. - In
ConcurrentLru.Trim()avoid incorrectly trimming an extra warm item when values are trimmed from cold but not warm. ConcurrentLru.TrimExpired()is now thread safe.- Minor code cleanups by @Joy-less.
Full changelog: v2.5.3...v2.5.4
v2.5.3
What's changed
- Eliminate volatile writes in
ConcurrentLruinternal bookkeeping code for pure reads, improving concurrent read throughput by 175%. - Vectorize the hot methods in
CmSketchusing Neon intrinsics for ARM CPUs. This results in slightly betterConcurrentLfucache throughput measured on Apple M series and Azure Cobalt 100 CPUs. - Unroll loops in the hot methods in
CmSketch. This results in slightly betterConcurrentLfuthroughput on CPUs without vector support (i.e. neither x86 AVX2 nor Arm Neon). - On vectorized code paths (AVX2 and Neon),
CmSketchallocates the internal buffer using the pinned object heap on .NET6 or newer. Use of the fixed statement is removed, eliminating a very small overhead. Sketch block pointers are then aligned to 64 bytes, guaranteeing each block is always on the same CPU cache line. This provides a small speedup for theConcurrentLfumaintenance thread by reducing CPU cache misses. - Minor improvements to the AVX2 JITted code via
MethodImpl(MethodImplOptions.AggressiveInlining)and removal of local variables to improve performance on .NET8/9 and dynamic PGO.
Full changelog: v2.5.2...v2.5.3
v2.5.2
What's changed
- Fix race between update and
TryRemove(KeyValuePair)for bothConcurrentLruand FixConcurrentLfu. Prior to this fix, values may be deleted if the value is updated to no longer match theTryRemoveinput argument whileTryRemoveis executing. - Fix
ConcurrentLfutorn writes for large structs using SeqLock.
Full changelog: v2.5.1...v2.5.2
v2.5.1
What's changed
- Fix
ConcurrentLfutime-based expiry policy failing to update the entry's expiry on read. Prior to this fix, expiry is only updated when the read buffer is processed (following a cache write, or when the read buffer is full). - Fix
ConcurrentLrutorn writes for large structs using SeqLock. - Fix torn writes for 64-bit current time on 32-bit platforms for
ConcurrentLruAfterAccessPolicyandDiscretePolicy. - P/Invoke
TickCount64to evaluate current time for .NET Standard on Windows,Duration.SinceEpochis 5x faster resulting in lower latency lookups forConcurrentTLru/ConcurrentTLfu. - Use
Stopwatch.GetTimestampto evaluate current time on MacOS,Duration.SinceEpochis about 20% faster resulting in slightly lower latency lookups forConcurrentTLru/ConcurrentTLfu.
Full changelog: v2.5.0...v2.5.1
v2.5.0
What's changed
- Provide time-based expiry for
ConcurrentLfu, matchingConcurrentLru. This closely follows the implementation in Java's Caffeine, using a port of Caffeine's hierarchical timer wheel to perform all operations in O(1) time. Expire after write, expire after access and expire after usingIExpiryCalculatorcan be configured viaConcurrentLfuBuilderextension methods. - Provide
ICacheExtandIAsyncCacheExtto enable client code compiled against .NET Standard to use the builder APIs and cache methods added since v2.0. These new methods are excluded in the base interfaces for .NET Standard, since adding them would be a breaking change. - Provide the
Durationconvenience methodsFromHoursandFromDays.
Full changelog: v2.4.1...v2.5.0
v2.4.1
What's changed
- Fixed a race condition in
ConcurrentLfufor add-remove-add of the same key. MpscBoundedBuffer.Clear()is now thread safe, fixing a race inConcurrentLfuclear.- Fixed
ConcurrentLruCountandIEnumerable<KeyValuePair<K,V>>to filter out expired items when used with time-based expiry. - BitFaster.Caching is now compiled with
<nullable>enable</nullable>, and APIs are annotated to support null reference type static analysis.
Full changelog: v2.4.0...v2.4.1
v2.4.0
What's changed
- Provide two new time-based expiry schemes for
ConcurrentLru:- Expire after access: evict after a fixed duration since an entry's most recent read or write. This is equivalent to MemoryCache's sliding expiry, and is useful for data bound to a session that expires due to inactivity.
- Per item expiry time: evict after a duration calculated for each item using the specified
IExpiryCalculator. Expiry time may be set independently at creation, after a read and after a write.
- Align
TryRemoveoverloads withConcurrentDictionaryforIAsyncCacheandAsyncAtomicFactory, matching the implementation forICacheadded in v2.3.0. This adds two new overloads:bool TryRemove(K key, out V value)- enables getting the value that was removed.bool TryRemove(KeyValuePair<K, V> item)- enables removing an item only when the key and value are the same.
- Add extension methods to make it more convenient to use
AsyncAtomicFactorywith a plainConcurrentDictionary. This is similar to storing anAsyncLazy<T>instead ofT, but with the same exception propagation semantics and API asConcurrentDictionary.GetOrAdd. - BitFaster.Caching assembly marked as trim compatible to enable trimming when used in native AOT applications.
AtomicFactoryvalue initialization logic modified to mitigate lock convoys, based on the approach given here.- Fixed
ConcurrentLru.Clearto correctly handle removed items present in the internal bookkeeping data structures.
Full changelog: v2.3.3...v2.4.0
v2.3.3
What's changed
- Eliminated all races in
ConcurrentLrueviction logic, and the transition between the cold cache and warm cache eviction routines. This prevents a variety of rare 'off by one item count' situations that could needlessly evict items when the cache is within bounds. - Fix
ConcurrentLru.Clear()to always clear the cache when items in the warm queue are marked as accessed. - Optimize
ConcurrentLfudrain buffers logic to give ~5% better throughput (measured by the eviction throughput test). - Cache the
ConcurrentLfudrain buffers delegate to prevent allocating a closure when scheduling maintenance. BackgroundThreadSchedulerandThreadPoolSchedulernow useTaskScheduler.Default, instead of implicitly usingTaskScheduler.Current(fixes CA2008).ScopedAsyncCachenow internally callsConfigureAwait(false)when awaiting tasks (fixes CA2007).- Fix
ConcurrentLrudebugger display on .NET Standard.
Full changelog: v2.3.2...v2.3.3
v2.3.2
What's changed
- Fix
ConcurrentLruNullReferenceExceptionwhen expiring and disposing null values (i.e. the cached value is a reference type, and the caller cached a null value). - Fix
ConcurrentLfuhandling of updates to detached nodes, caused by concurrent reads and writes. Detached nodes could be re-attached to the probation LRU pushing out fresh items prematurely, but would eventually expire since they can no longer be accessed.
Full changelog: v2.3.1...v2.3.2
v2.3.1
What's changed
- Introduce a simple heuristic to estimate the optimal
ConcurrentDictionarybucket count forConcurrentLru/ConcurrentLfu/ClassicLrubased on thecapacityconstructor arg. When the cache is at capacity, theConcurrentDictionarywill have a prime number bucket count and a load factor of 0.75.- When capacity is less than 150 elements, start with a
ConcurrentDictionarycapacity that is a prime number 33% larger than cache capacity. Initial size is large enough to avoid resizing. - For larger caches, pick
ConcurrentDictionaryinitial size using a lookup table. Initial size is approximately 10% of the cache capacity such that 4ConcurrentDictionarygrow operations will arrive at a hash table size that is a prime number approximately 33% larger than cache capacity.
- When capacity is less than 150 elements, start with a
SingletonCachesets the internalConcurrentDictionarycapacity to the next prime number greater than the capacity constructor argument.- Fix ABA concurrency bug in
Scopedby changingReferenceCountto use reference equality (viaobject.ReferenceEquals). - .NET6 target now compiled with
SkipLocalsInit. Minor performance gains. - Simplified
AtomicFactory/AsyncAtomicFactory/ScopedAtomicFactory/ScopedAsyncAtomicFactoryby removing redundant reads, reducing code size. ConcurrentLfu.Countnow does not lock the underlyingConcurrentDictionary, matchingConcurrentLru.Count.- Use
CollectionsMarshal.AsSpanto enumerate candidates withinConcurrentLfu.Trimon .NET6.
Full changelog: v2.3.0...v2.3.1