Performances of 2D integration vs 1D integration#

This is dependent on:

  • Number of azimuthal bins

  • Pixel splitting

  • Algorithm

  • Implementation (i.e. programming language)

  • Hardware used

Thus there is no general answer. But here is a quick benchmark to evaluate the penalty on performances:

import sys import os import time import numpy import fabio import pyFAI from pyFAI.test.utilstest import UtilsTest import pyFAI.method_registry import pyFAI.integrator.azimuthal print(f”Python version: {sys.version}”) print(f”PyFAI version: {pyFAI.version}”) start_time = time.perf_counter()

import sys
import os
import time
import numpy

os.environ["PYOPENCL_COMPILER_OUTPUT"] = "0"
start_time = time.perf_counter()
import fabio
import pyFAI
from pyFAI.test.utilstest import UtilsTest
import pyFAI.method_registry
import pyFAI.integrator.azimuthal
print(f"Python version: {sys.version}")
print(f"PyFAI version: {pyFAI.version}")
Python version: 3.14.0 | packaged by conda-forge | (main, Oct 22 2025, 23:24:08) [GCC 14.3.0]
PyFAI version: 2026.6.0-dev0
print("Number of way to performing integration:", len(pyFAI.method_registry.IntegrationMethod.list_available()))
Number of way to performing integration: 95
ai = pyFAI.load(UtilsTest.getimage("Pilatus1M.poni"))
img = fabio.open(UtilsTest.getimage("Pilatus1M.edf")).data
ai
Detector Pilatus 1M	 PixelSize= 172µm, 172µm	 BottomRight (3)
Wavelength= 1.000000 Å
SampleDetDist= 1.583231e+00 m	PONI= 3.341702e-02, 4.122778e-02 m	rot1=0.006487  rot2=0.007558  rot3=0.000000 rad
DirectBeamDist= 1583.310 mm	Center: x=179.981, y=263.859 pix	Tilt= 0.571° tiltPlanRotation= 130.640° λ= 1.000Å
%%time
#Tune those parameters to match your needs:
kw1 = {"data": img, "npt":1000}
kw2 = {"data": img, "npt_rad":1000}
#Actual benchmark:
res = {}
for k,v in pyFAI.method_registry.IntegrationMethod._registry.items():
    print(k)
    if k.dim == 1:
        res[k] = %timeit -o ai.integrate1d(method=v, **kw1)
    else:
        res[k] = %timeit -o ai.integrate2d(method=v, **kw2)
Method(dim=1, split='no', algo='histogram', impl='python', target=None)
31.2 ms ± 336 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='no', algo='histogram', impl='python', target=None)
116 ms ± 237 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='no', algo='histogram', impl='cython', target=None)
11.2 ms ± 22.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='cython', target=None)
16.6 ms ± 29 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='histogram', impl='cython', target=None)
26.3 ms ± 52.4 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='bbox', algo='histogram', impl='cython', target=None)
33 ms ± 93.3 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=1, split='full', algo='histogram', impl='cython', target=None)
172 ms ± 332 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='full', algo='histogram', impl='cython', target=None)
282 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='pseudo', algo='histogram', impl='cython', target=None)
371 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='cython', target=None)
16.7 ms ± 2.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='no', algo='csr', impl='cython', target=None)
17.2 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='cython', target=None)
16.5 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='cython', target=None)
18 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='python', target=None)
10.2 ms ± 49 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='python', target=None)
14.9 ms ± 46.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='python', target=None)
13.4 ms ± 50.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='python', target=None)
17.9 ms ± 200 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csc', impl='cython', target=None)
8.62 ms ± 8.36 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csc', impl='cython', target=None)
11.2 ms ± 26.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csc', impl='cython', target=None)
10.8 ms ± 14.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csc', impl='cython', target=None)
14.2 ms ± 26.3 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csc', impl='python', target=None)
11.4 ms ± 21.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csc', impl='python', target=None)
14.4 ms ± 18.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csc', impl='python', target=None)
14.8 ms ± 20.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csc', impl='python', target=None)
22.1 ms ± 46.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='cython', target=None)
11.3 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='bbox', algo='lut', impl='cython', target=None)
23.4 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Compiler time: 0.20 s
Method(dim=1, split='no', algo='lut', impl='cython', target=None)
17.4 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='cython', target=None)
17.4 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='full', algo='lut', impl='cython', target=None)
18 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='full', algo='lut', impl='cython', target=None)
18.8 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='cython', target=None)
19.8 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method(dim=2, split='full', algo='csr', impl='cython', target=None)
16 ms ± 585 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='python', target=None)
13 ms ± 36.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='python', target=None)
17 ms ± 91.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csc', impl='cython', target=None)
10.7 ms ± 7.03 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csc', impl='cython', target=None)
14.4 ms ± 35.2 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csc', impl='python', target=None)
14.8 ms ± 33.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csc', impl='python', target=None)
21.8 ms ± 55.1 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(0, 0))
9.1 ms ± 12.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(0, 0))
2.58 ms ± 6.55 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(0, 1))
8.4 ms ± 6.14 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(0, 1))
4.19 ms ± 12.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(1, 0))
1 error generated.
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
15.6 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(1, 0))
1 error generated.
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
/users/kieffer/.venv/py314/lib/python3.14/site-packages/pyopencl/cache.py:496: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  _create_built_program_from_source_cached(
/users/kieffer/.venv/py314/lib/python3.14/site-packages/pyopencl/cache.py:500: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  prg.build(options_bytes, devices)
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
9.9 ms ± 708 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='histogram', impl='opencl', target=(2, 0))
WARNING:pyFAI.opencl.azim_hist:Your OpenCL compiler wrongly claims it support 64-bit atomics. Degrading to 32 bits atomics!
11.9 ms ± 706 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='histogram', impl='opencl', target=(2, 0))
6.5 ms ± 157 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(0, 0))
727 μs ± 2.52 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(0, 0))
2.56 ms ± 100 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(0, 0))
680 μs ± 3.09 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(0, 0))
2.58 ms ± 7.91 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(0, 1))
1.23 ms ± 2.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(0, 1))
6.13 ms ± 32.8 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(0, 1))
1.09 ms ± 1.56 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(0, 1))
6.07 ms ± 15.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(1, 0))
4.31 ms ± 21 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(1, 0))
9.08 ms ± 451 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(1, 0))
2.84 ms ± 24.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(1, 0))
6.18 ms ± 27.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=1, split='bbox', algo='csr', impl='opencl', target=(2, 0))
2.9 ms ± 125 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='bbox', algo='csr', impl='opencl', target=(2, 0))
81.7 ms ± 50.9 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='csr', impl='opencl', target=(2, 0))
2.65 ms ± 72.6 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='no', algo='csr', impl='opencl', target=(2, 0))
93.1 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(0, 0))
737 μs ± 1.36 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(0, 0))
2.66 ms ± 69.1 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(0, 1))
1.23 ms ± 599 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(0, 1))
6.16 ms ± 29.8 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(1, 0))
4.29 ms ± 43.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(1, 0))
9.05 ms ± 637 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='csr', impl='opencl', target=(2, 0))
2.85 ms ± 112 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=2, split='full', algo='csr', impl='opencl', target=(2, 0))
81.9 ms ± 111 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(0, 0))
3.19 ms ± 6.47 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(0, 0))
335 ms ± 33.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(0, 0))
1.62 ms ± 940 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(0, 0))
198 ms ± 5.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(0, 1))
3.15 ms ± 8.23 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(0, 1))
318 ms ± 3.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(0, 1))
1.82 ms ± 1.12 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(0, 1))
200 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(1, 0))
4.71 ms ± 92.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(1, 0))
192 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(1, 0))
3.69 ms ± 61.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(1, 0))
156 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='bbox', algo='lut', impl='opencl', target=(2, 0))
3.37 ms ± 48.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='bbox', algo='lut', impl='opencl', target=(2, 0))
267 ms ± 7.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='no', algo='lut', impl='opencl', target=(2, 0))
2.75 ms ± 40.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='no', algo='lut', impl='opencl', target=(2, 0))
224 ms ± 2.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(0, 0))
2.62 ms ± 5.82 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(0, 0))
316 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(0, 1))
2.77 ms ± 102 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(0, 1))
320 ms ± 4.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(1, 0))
4.47 ms ± 34 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(1, 0))
191 ms ± 7.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Method(dim=1, split='full', algo='lut', impl='opencl', target=(2, 0))
3.73 ms ± 170 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Method(dim=2, split='full', algo='lut', impl='opencl', target=(2, 0))
271 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
CPU times: user 1h 34min 46s, sys: 6min 57s, total: 1h 41min 44s
Wall time: 7min 22s
print("-"*80)
print(f"{'Split':5s} | {'Algo':9s} | {'Impl':6s}| {'1d (ms)':8s} | {'2d (ms)':8s} | {'ratio':6s} | Device")
print("-"*80)
for k in res:
    if k.dim == 1:
        k1 = k
        k2 = k._replace(dim=2)
        if k2 in res:
            print(f"{k1.split:5s} | {k1.algo:9s} | {k1.impl:6s}| {res[k1].best*1000:8.3f} | {res[k2].best*1000:8.3f} | {res[k2].best/res[k1].best:6.1f} | ",
                    end="")
        if k.target:
            print(pyFAI.method_registry.IntegrationMethod._registry.get(k).target_name)
        else:
            print()
print("-"*80)
--------------------------------------------------------------------------------
Split | Algo      | Impl  | 1d (ms)  | 2d (ms)  | ratio  | Device
--------------------------------------------------------------------------------
no    | histogram | python|   30.644 |  115.437 |    3.8 | 
no    | histogram | cython|   11.207 |   16.610 |    1.5 | 
bbox  | histogram | cython|   26.195 |   32.932 |    1.3 | 
full  | histogram | cython|  171.495 |  279.153 |    1.6 | 
no    | csr       | cython|   12.878 |   14.776 |    1.1 | 
bbox  | csr       | cython|   12.423 |    9.908 |    0.8 | 
no    | csr       | python|   10.168 |   14.889 |    1.5 | 
bbox  | csr       | python|   13.378 |   17.716 |    1.3 | 
no    | csc       | cython|    8.602 |   11.165 |    1.3 | 
bbox  | csc       | cython|   10.764 |   14.203 |    1.3 | 
no    | csc       | python|   11.323 |   14.375 |    1.3 | 
bbox  | csc       | python|   14.743 |   22.062 |    1.5 | 
bbox  | lut       | cython|    7.743 |   16.002 |    2.1 | 
no    | lut       | cython|   15.040 |   14.484 |    1.0 | 
full  | lut       | cython|   15.599 |   15.961 |    1.0 | 
full  | csr       | cython|   18.060 |   14.901 |    0.8 | 
full  | csr       | python|   13.001 |   16.892 |    1.3 | 
full  | csc       | cython|   10.724 |   14.390 |    1.3 | 
full  | csc       | python|   14.738 |   21.705 |    1.5 | 
no    | histogram | opencl|    9.077 |    2.579 |    0.3 | NVIDIA CUDA / NVIDIA RTX A5000
no    | histogram | opencl|    8.396 |    4.185 |    0.5 | NVIDIA CUDA / Quadro P2200
no    | histogram | opencl|   13.861 |    8.942 |    0.6 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | histogram | opencl|   11.089 |    6.323 |    0.6 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | csr       | opencl|    0.724 |    2.327 |    3.2 | NVIDIA CUDA / NVIDIA RTX A5000
no    | csr       | opencl|    0.674 |    2.566 |    3.8 | NVIDIA CUDA / NVIDIA RTX A5000
bbox  | csr       | opencl|    1.224 |    6.103 |    5.0 | NVIDIA CUDA / Quadro P2200
no    | csr       | opencl|    1.090 |    6.053 |    5.6 | NVIDIA CUDA / Quadro P2200
bbox  | csr       | opencl|    4.280 |    8.437 |    2.0 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | csr       | opencl|    2.802 |    6.147 |    2.2 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | csr       | opencl|    2.729 |   81.681 |   29.9 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | csr       | opencl|    2.536 |   91.036 |   35.9 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | csr       | opencl|    0.734 |    2.618 |    3.6 | NVIDIA CUDA / NVIDIA RTX A5000
full  | csr       | opencl|    1.233 |    6.121 |    5.0 | NVIDIA CUDA / Quadro P2200
full  | csr       | opencl|    4.218 |    8.185 |    1.9 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | csr       | opencl|    2.720 |   81.736 |   30.0 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | lut       | opencl|    3.181 |  312.469 |   98.2 | NVIDIA CUDA / NVIDIA RTX A5000
no    | lut       | opencl|    1.616 |  193.421 |  119.7 | NVIDIA CUDA / NVIDIA RTX A5000
bbox  | lut       | opencl|    3.148 |  313.568 |   99.6 | NVIDIA CUDA / Quadro P2200
no    | lut       | opencl|    1.821 |  197.633 |  108.5 | NVIDIA CUDA / Quadro P2200
bbox  | lut       | opencl|    4.642 |  186.442 |   40.2 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | lut       | opencl|    3.608 |  152.630 |   42.3 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
bbox  | lut       | opencl|    3.315 |  258.649 |   78.0 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
no    | lut       | opencl|    2.713 |  221.932 |   81.8 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | lut       | opencl|    2.613 |  313.425 |  120.0 | NVIDIA CUDA / NVIDIA RTX A5000
full  | lut       | opencl|    2.717 |  315.627 |  116.2 | NVIDIA CUDA / Quadro P2200
full  | lut       | opencl|    4.417 |  181.263 |   41.0 | Portable Computing Language / cpu-haswell-AMD Ryzen Threadripper PRO 3975WX 32-Cores
full  | lut       | opencl|    3.320 |  262.840 |   79.2 | Intel(R) OpenCL / AMD Ryzen Threadripper PRO 3975WX 32-Cores
--------------------------------------------------------------------------------
print(f"Total runtime: {time.perf_counter()-start_time:.3f}s")
Total runtime: 443.482s