# Performing distributed AI on a dataset

This page explains how to distribute azimuthal integration of a dataset.

You first have to [create a configuration file](config_file) which describes how the azimuthal integration should be performed.

Then, Integrator provides **two commands: `integrate-slurm` and `integrate-mp`**.
Both work the same way: with a configuration file.


## Run distributed AI on the local machine

The azimuthal integration can be distributed on a local powerful machine.

Modify the configuration file to set `partition = local`:


```ini
[computations distribution]

partition = local
n_workers = 4
cores_per_worker = 4
```

Then run `integrate-mp conf_file.conf`


## Run distributed AI using several powerful (GPU) machines

Use the `integrate-mp` command, using `partition = gpu` or `partition = p9gpu`.

```{warning}
in this case, `n_workers` has a different meaning. It will actually spawn `8 * n_workers` (8 workers by SLURM job).
```


## Run distributed AI using many CPU machines

Use the `integrate-slurm` command, with `partition = nice` or `partition = p9gpu`.

With this mode, you will likely have to use many workers to achieve decent speed (especially when using `partition = nice`).


## How many datasets can I expect to process ?


Generally, one worker will be able to process a certain number of datasets per hour.
This number is approximately

```python
dataset_per_hour_per_worker = ((frac_hour * 3600) * fps) / frames_per_dataset
```

Where
  - `frames_per_dataset` is the total number of frames for each dataset (eg. `28000`)
  - `frac_hour` is the fraction during which azimuthal integration is actually performed, alongside "administrative" tasks (browsing datasets, saving files, etc).

### Example for `frames_per_dataset = 28000`:

gpu partition
  - n_workers = 1 : 50 datasets / hour
  - n_workers = 4: 200 datasets / hour

p9gpu-long partition
  - n_workers = 1 : 40 datasets/hour
  - n_workers = 4 : 160 datasets/hour