- Extend
Base.Broadcastby macros:@tab: Tuple of Array Broadcast --- broadcast with multiple outputs will be stored in tuple of array (instead of array of tuple).@mtb: MultiThread Broadcast --- perform broadcast with multiple threads.@mtab:@mtb+@tab@stb: force STructArray Broadcast --- it only works if user loadsStructArrays.jl
@tab: supportCuArray,OffsetArray,Tuple,StructArray,StaticArray
julia> a = randn(4000,4000);
julia> @tab b, c = sincos.(a);
julia> @tab b, c = broadcast(sincos,a);
julia> @tab b, c = broadcast(a) do x
sincos(x)
end;
julia> @tab b, c .= sincos.(a);
julia> broadcast!(sincos,(b,c),a);- For
outputs <: AbstractArray- Only the default
copymethod which usesimilar(bc, T)is implemented, thus inputs likeStaticArrayis not allowed for non-inplace caluculation by default. We have an extension for@tabwithStaticArrays. @tabis not optimized for BitArray. The default return type is Array{Bool} for non-inplace broadcast.
- Only the default
- For
outputs <: Tuple,@tabfirst generate all results and then seperate them. @tabis not designed for too many outputs.
@mtb: cpu multi-threads broadcast
julia> a = randn(4000,4000); b = similar(a);
julia> @btime @mtb @. $b = sin(a);
47.756 ms (22 allocations: 2.97 KiB)
julia> @btime @. $b = sin(a);
167.985 ms (2 allocations: 32 bytes)
julia> Threads.nthreads()
4@mtbuseCartesianPartitionto seperate the task with dimension > 1@mtbwill be turned off automately forCuArrayandTuple@mtbassume all elements in the dest array(s) are seperated in the memory and there's no thread safety check.@mtbis not tuned for small arrays (It won't invoke the single thread version automately).- User can change the number of threads by :
- Call
ExBroadcast.set_num_threads(n)for global change. - Use 2 inputs macro
@mtb n [...]for local change. (thread safe)
- Call
@mtabonly save some compile cost.