Skip to content

Conversation

@brianpopow
Copy link
Collaborator

@brianpopow brianpopow commented Feb 23, 2022

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This PR adds a SSE2 version of the average filter (4 bytes per pixel only), which is used for decoding PNG's.
I tried a SSE2 version for 3 bytes per pixel also, but benchmarks did not show god results. Probably due to misaligned read/write.

edit:
PR now also includes SSE/AVX versions of Paeth, Sub and Up filter.

Benchmark results:

master:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
  Job-OGQEMG : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
|                      Method |     Mean |     Error |    StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |---------:|----------:|----------:|------:|------:|------:|----------:|
| 'Average-filtered PNG file' | 2.626 ms | 0.0179 ms | 0.0046 ms |     - |     - |     - |      3 KB |

PR:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
  Job-OGQEMG : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT

Runtime=.NET 5.0  Arguments=/p:DebugType=portable  IterationCount=5
LaunchCount=1  WarmupCount=3

|                             Method |        Mean |     Error |   StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------- |------------:|----------:|---------:|------:|--------:|-------:|------:|------:|----------:|
| 'Average-filtered PNG file (4bpp)' | 2,212.37 us | 14.979 us | 3.890 us | 55.63 |    5.27 |      - |     - |     - |      3 KB |

Testimage was:
AverageFilter4Bpp

@brianpopow brianpopow changed the title PNG: Add SSE2 version of average filter WIP: PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow brianpopow changed the title WIP: PNG: Add SSE2 version of average filter PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow brianpopow changed the title PNG: Add SSE2 version of average filter WIP: PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow brianpopow changed the title WIP: PNG: Add SSE2 version of average filter PNG: Add SSE2 version of average filter Feb 23, 2022
@brianpopow
Copy link
Collaborator Author

@gfoidl thanks for the suggestions!

@brianpopow
Copy link
Collaborator Author

I have added SSE/AVX versions of the other filters, too.

Benchmark with random data of size 1024:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1526 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT
  DefaultJob : .NET 5.0.14 (5.0.1422.5710), X64 RyuJIT


|      Method |        Mean |     Error |    StdDev |
|------------ |------------:|----------:|----------:|
|    UpScalar |   525.50 ns |  2.733 ns |  2.557 ns |
|      UpSse2 |    36.66 ns |  0.160 ns |  0.150 ns |
|      UpAvx2 |    19.11 ns |  0.074 ns |  0.069 ns |
|   SubScalar |   654.48 ns |  3.637 ns |  3.224 ns |
|     SubSse2 |   172.30 ns |  2.738 ns |  2.561 ns |
| PaethScalar | 6,643.09 ns | 23.076 ns | 21.585 ns |
|   PaethSse2 |   674.23 ns |  2.468 ns |  2.309 ns |

This was referenced Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants