- 
                Notifications
    You must be signed in to change notification settings 
- Fork 167
Home
Tensile is a tool for creating a benchmark-driven backend library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPU.
Overview for creating a custom TensileLib backend library for your application :
- Install the PyYAML dependency (mandatory), git clone and cd Tensile
- Create a benchmark config.yaml file in ./Tensile/Configs/
- Run the benchmark. After the benchmark is finished. Tensile will dump 4 directories: 1 & 2 is about benchmarking. 3 & 4 is the summarized results from your library (like rocBLAS) viewpoints.
1_BenchmarkProblems: has all the problems descriptions and executables generated during benchmarking, where you can re-launch exe to reproduce results.
2_BenchmarkData: has the raw performance results.
3_LibraryLogic: has optimal kernel configurations yaml file and winner.csv. Usually rocBLAS takes the yaml files from this folder.
4_LibraryClient: has a client exe, so you can launch from a library viewpoint.
- Add the Tensile library to your application's CMake target. The Tensile library will be written, compiled and linked to your application at application-compile-time.
- GPU kernels, written in HIP, OpenCL, or assembly.
- Solution classes which enqueue the kernels.
- APIs which call the fastest solution for a problem.
 
sudo apt-get install python-yaml
mkdir Tensile
cd Tensile
git clone https://github.com/RadeonOpenCompute/Tensile.git repo
mkdir build
cd build
python ../repo/Tensile/Tensile.py ../repo/Tensile/Configs/test_sgemm.yaml ./
After a while of benchmarking, Tensile will print out the path to the client you can run.
./4_LibraryClient/build/client -h
./4_LibraryClient/build/client --sizes 5760 5760 5760