-
Notifications
You must be signed in to change notification settings - Fork 7
Comparative Tests on KME BKM and VWF
Edmond Chow edited this page Oct 18, 2020
·
6 revisions
The performance of H2Pack strongly depends on the performance of the kernel evaluation functions, i.e., the Kernel Matrix Evaluation (KME) function and Bi-Kernel Matvec (BKM) function. The KME and BKM functions can be either vectorized manually using the provided Vector Wrapper Functions (VWF) or vectorized automatically by the C compiler.
The following numerical results demonstrate how these
different techniques could affect the performance of
-construction and
-matvec.
Hardware and software configuration
- 2 * Intel Xeon Gold 6226 CPU @ 2.7GHz (2 * 12 cores, 2 * 12 * 2 threads, hyperthreading disabled)
- 6 * 32 GB DDR4 memory
- Red Hat Enterprise Linux 7.6 (kernel 3.10.0-957.12.1.el7)
- Intel Parallel Studio Cluster version 2019.5
- ICC optimization flags: -O3 -xHost
- OpenMP environment variables
- OMP_NUM_THREADS=24
- OMP_PLACES=cores
- OMP_PROC_BIND=close
Test settings
- Point sets: uniformly and randomly distributed points in a 3D unit ball
- Running mode: JIT
- Relative error threshold: 1e-6
- Kernel: 3D Gaussian
with
- Comparison of kernel implementations:
- no vectorization ("no-vec")
- ICC automatic vectorization ("auto-vec")
- manual vectorization by VWF ("wrap-vec")
Number of Points | 100,000 | 400,000 | 1,600,000 | |
---|---|---|---|---|
|
KME no-vec | 0.022 | 0.083 | 0.440 |
KME auto-vec | 0.020 | 0.092 | 0.448 | |
KME wrap-vec | 0.023 | 0.084 | 0.442 | |
|
KME no-vec | 0.120 | 0.313 | 0.745 |
KME auto-vec | 0.038 | 0.101 | 0.260 | |
KME wrap-vec | 0.028 | 0.081 | 0.233 | |
BKM no-vec | 0.161 | 0.369 | 0.908 | |
BKM auto-vec | 0.031 | 0.091 | 0.265 | |
BKM wrap-vec | 0.020 | 0.056 | 0.156 |
Notes:
- Computation in
-construction is dominated by the column-pivoted QR. It only gains minor performance improvement from vectorization of KME functions.
- Both automatic and manual vectorization of KME and BKM functions can lead to 300% - 400% speedup in
-matvec, while manual vectorization is 20% - 50% faster than automatic vectorization.
- Return to the top H2Pack github page (leave this wiki)
- Installing H2Pack
- Basic Application Interface
- Using and Writing Kernel Functions
- Two Running Modes for H2Pack
- HSS-Related Computations
- Bi-Kernel Matvec (BKM) Functions
- Vector Wrapper Functions for Kernel Evaluations
- Proxy Points and their Reuse
- Python Interface
- H2 Matrix File Storage Scheme (draft)
- Using H2 Matrix File Storage