Architecture and design decisions

This page is a brief overview of nabu architecture and design decisions.

Architecture overview

Nabu consists in a series of modules with a defined processing scope: pre-processing, reconstruction, I/Os, parameters estimation, pipelining.

Importantly, these modules are as decoupled as possible - it is generally possible to use the features of one module without using another module.

Modules breakdown

  • preproc: processing happening before reconstruction

  • reconstruction: sinogram denoising, filtering, and FBP

  • io: read and write data

  • misc: miscellaneous image processing functions (convolution, histogram, filtering)

  • pipeline: reconstruction pipeline, for now full-field

  • app: command-line tools

  • cuda: CUDA-specific utilities

  • resources: mostly dataset parsing and logger. May be removed in the future.

Each module has a tests submodules containing unit tests. These tests can be run either with pytest tests/file.py, or with the nabu-test CLI tool.

Backends and API

Each processing function/class is first implemented in python/numpy so that it can be tested easily. Additionally, some functions/classes can have other backends for performances (eg. Cuda, OpenCL). In this case, the API must be the same (possibly with additional specific keywords arguments).

See also

Processing classes, pipelines and reconstructors

Design decisions

Nabu aims at being simple, versatile ; while offering high performance processing capabilities. These goals have an impact on the overall design. More generally, we try to avoid pitfalls commonly found in some scientific software.

The following design decision will be listed as a series of aphorisms.

Decouple I/O code from processing code

Many scientific codes mix reading/writing data with the processing part. Such codes are usually not re-usable, as they do assumptions on files path and formats.

More generally, decouple functions as much as possible, so that they can be used and tested separately.

Allocate resources once, use them many times (stateful computations)

It is common in scientific software to write functions which we “fire-and-forget” on data. It is also the default approach for most workflow engines: build a computational graph where each node is a state-less function.

However, when performance matters, this approach is not viable. Usually, memory has to be allocated, and some pre-computations have to be done. For example, Fast Fourier Transform (FFT) software internally rely on a “plan” - a data structure that pre-computes many things on the kind of data it will process. Doing these allocations and pre-computations each time dramatically hampers performances, especially if the function is to be used many times. For example, allocating a large chunk of memory for each function call can be costly, especially in GPU programming.

In nabu, the default approach is to

  • Instantiate a class with some data description

  • Use it many times

For example:

from nabu.preproc.phase import PaganinPhaseRetrieval

phase_retriever = PaganinPhaseRetrieval(
    radio_shape,
    distance=distance_m,
    energy=energy_kev,
    delta_beta=delta_beta,
    pixel_size=1e-6
)

for radio in radios:
    phase_retriever.apply_filter(radio, output=radio)

Minimize data transfers

A tomography pipeline oriented to high performance should avoid memory exchanges (CPU<->GPU, node<->node) whenever possible. Note that our stateful approach simplifies this issue, as we have more control on memory (it is bound to a current class instance).

Synchrotron X-rays have the nice property to form a parallel beam, so let’s use this many-millions euros investment: each horizontal slice/slab can be reconstructed independently without exchanging any data. For cone-beam geometry, excellent reconstruction software is available.

In its current state, nabu spends almost half of the total reconstruction time reading/writing data, even on GPFS or fast SSD, to reconstruct a volume on a single machine. This means that any optimization of the processing software can bring at most a factor of two speed-up. (this is less and less true as detector increase their number of pixels: FBP becomes a bottleneck).

Generally speaking, many high-performance scientific softwares are I/O bound rather than compute-bound. Even compute-critical parts will be about optimizing internal memory access in GPU or minimizing cache miss in CPU.

Don’t reinvent a generic processing pipeline, focus on what matters

Off-the-shelf solutions for distributing computations on many nodes are available, for example distributed/dask_jobqueue. Therefore, we focus on the added value of scientific software, which is data processing/analysis algorithms. Writing yet another generic pipeline is a liability in the codebase, as it should be maintained additionally to the processing part.

Our first goal is to provide a collection of building blocks for tomography (processing functions and classes) as done by tomopy. But these have to be assembled to form a complete processing pipeline which can be used from the command line. The “assembling” part should be kept as simple as possible.

Admittedly, when writing a processing pipeline, the trade-off between simplicity/maintainability and versatility/complexity is difficult to find. In nabu, we use a submodule nabu.pipeline which tries to be as small as possible (and is probably already too complicated).

Minimize the barrier to entry for users and developers

The code should be accessible to “scientists who can write some code” ; not only to professional developers. A software that can be extended by many people has a higher life expectancy.

  • Use native data structures whenever possible (dict, lists, etc). No Enum/nametuple or other constructs that are abstruse for non-developers.

  • Use a simple design: functions/classes as building blocks, and write a pipeline on top of them. No scheduler/core system and plugins all over the place. Most of the code should be about tomography processing.

Simplify code distribution

Prefer just-in-time compilation (pyopencl, pycuda, numba?) to ahead-of-time compilation (eg. Cython extensions). “pure-python” package are much easier to distribute on many platforms. By contrast, packages with native extensions require extensive efforts to be made work on many platforms.

“Explicit is better than implicit”

We take the opposite approach of “it’s a GPU array with the exact same interface than a numpy.ndarray, please do as if it was one!”. This approach is used by cupy or reikna/cluda.

Although the duck typing practice has been a factor of Python’s success, it does have limitations. The rising trend of using typing in python codebases is an indication. Using objects indifferently is powerful, but very difficult to debug when it goes wrong, especially when doing GPU programming.

In nabu, a GPU array is a GPU array, not a numpy array, and it has to be handled as such.