Correlation function computation

0. Definitions and notations

An XPCS experiment collects a series of frames \(I_0, I_1, I_2, \ldots, I_{N_t - 1}\) where \(N_t\) is the total number of acquired frames. Each frame has \(N_x \times N_y\) pixels, where \(N_y\) and \(N_x\) are the number of rows and columns respectively.

Once stacked, the acquired frames form a volume of shape (Nt, Ny, Nx).

0.1 - Intensity correlation function

The aim is to compute the intensity correlation function \(g_2(q, \tau)\) defined as

\[g_2(q, \tau) = \frac{ \langle \langle I(t,\, p) I(t + \tau,\, p) \rangle_p \rangle_t } { \langle \langle I(t,\, p) \rangle_p \langle I(t + \tau,\, p) \rangle_p \rangle_t } = \frac{n_2(q, \tau)}{d_2(q, \tau)}\]

In the above definition, there are two types of averaging. The first denoted as \(\langle \cdot \rangle_p\) is the ensemble averaging of pixels \(p\) belonging to the current scattering \(q\)-vector \(q\), and the other \(\langle \cdot \rangle_t\) is time averaging. They are defined in the sections below.

0.2 - Ensemble averaging

The averaging of pixels \(p\) belonging to the set of pixels \(Q\) of current scattering vector \(q\) is

\[\langle I(t, p) \rangle_p = \frac{1}{N_q} \int_{p \in Q} I(t, p) \operatorname{d}\! p\]

where \(N_q\) is the cardinality of \(Q\), i.e the number of pixels belonging to the scattering vector \(q\).

0.3 - Time averaging

The time averaging of a time-dependent function \(r(t)\) is defined as

\[\langle r(t) \rangle_t = \frac{1}{\Delta\! t} \int_0^{\Delta\! t} r(t) \operatorname{d}\! t\]

therefore, the time averaging of the quantity \(r(t, t+\tau, q) = \langle I(t,\, p) I(t + \tau,\, p) \rangle_p\) is

\[\langle r (t, t+ \tau, q) \rangle_t = \frac{1}{\Delta\! t} \int_0^{\Delta\! t} r(t, t+\tau, q) \operatorname{d}\! t\]

Note that on the function \(g_2(q, \tau)\), one time averaging is done at the numerator and denominator. Therefore, the factor \(1/\Delta\! t\) is not needed.

1 - On the computation of discrete autocorrelation

1.1 - Direct approach

The discrete auto-correlation of a \(N\) samples array \(\vec{a} = [a_0, a_1, \ldots, a_{N-1}]\) is a \(N\) samples array \(\vec{c}\), where the \(i\)-th sample \(c_i\) is given by

\[c_i = \sum_{j=i}^{N-1} a_j a_{j-i}\]

Notably, \(c_0 = \displaystyle\sum_{j=0}^{N-1} a_j^2\) is the sum of the squared samples in \(\vec{a}\).

Computing \(\vec{c}\) takes one sum over \(N\) elements to compute \(c_0\), one sum over \(N-1\) elements to compute \(c_1\), and so on. Therefore, the number of operations to compute \(\vec{c}\) is typically \(N (N+1)/2\).

1.2 - Linear algebra approach

Computing the correlation can be done in a linear algebra fashion. If \(\vec{a}\) is seen as a one-column matrix of \(N\) lines, then its outer product is

\[\begin{split}\vec{a} \cdot \vec{a}^T = \begin{bmatrix} a_0 a_0 & a_0 a_1 & \ldots & a_0 a_{N-1} \\ a_1 a_0 & a_1 a_1 & \ldots & a_1 a_{N-1} \\ \vdots & \vdots & \ddots & \vdots \\ a_{N-1} a_0 & a_{N-1} a_1 & \ldots & a_{N-1} a_{N-1} \end{bmatrix}\end{split}\]

Summing the main diagonal gives \(c_0\), summing the rank-\(k\) diagonal gives \(c_k\). This method generates a \((N, N)\) matrix and does \(N^2\) operations for computing the outer product. Therefore, it does twice too much operations (as only the upper or lower part of \(\vec{a} \cdot \vec{a}^T\) is needed ), but the matrix multiplication is typically accelerated with off-the-shelf optimized BLAS software.

1.3 - Fourier approach

\[\newcommand{\F}[1]{\mathcal{F}\left[#1\right]} \newcommand{\Fi}[1]{\mathcal{F}^{-1}\left[#1\right]} \newcommand{\v}[1]{\textbf{#1}}\]

From properties of the Fourier transform, the autocorrelation of a function \(a(t)\) is given by

\[\Fi{\F{a} \cdot \F{\tilde{a}}}\]

where \(\F{}\) denotes the continuous Fourier transform, and \(\tilde{a}\) is the reversed function \(a\): \(\tilde{a}(t) = a(-t)\).

The same is true in the discrete setting where \(\mathcal{F}\) is replaced with the Fast Fourier Transform, provided that the number of points is chosen accordingly (i.e at least twice the number of samples in \(\vec a\)). The discrete Fast-Fourier-based approach for computing the autocorrelation of \(\vec a\) is

\[\vec c = \v F^{-1} \left( \v F \rho (\vec{c}) \; \v F \rho(\tilde c) \right)\]

where \(\rho(\vec c)\) is the operation of padding the vector \(\vec c\) with zeros, up to at least \(2 N\) samples.

2. Dense correlator

2.1 - Matrix multiplication dense correlator

Let \(q\) be the current scattering vector, \(Q\) the set of pixels belonging to this scattering vector, \(N_q\) the cardinality of \(Q\). frames is a stack of \(N_t\) frames, each of them having \(N_x \times N_y\) pixels.

In a frame \(I_t\), we concatenate all pixels belonging to the current \(q\) vector. This gives a one-dimensional array with \(N_q\) elements.

# frames.shape : (Nt, Ny, Nx)
# Q.shape : (Ny, Nx)
frame = frames[t] # t = 0 ... Nt-1
pixels_values = frame[Q] # Q is a mask

We do this for all frames and stack vertically each 1D array, giving a 2D array \(\vec A\) of shape (Nt, Nq):

\[\begin{split}\vec{A} = \begin{bmatrix} I_{0, p_0} & I_{0, p_1} & \ldots & I_{0, p_{N_q-1}} \\ I_{1, p_0} & I_{1, p_1} & \ldots & I_{1, p_{N_q-1}} \\ \vdots & \vdots & \ldots & \vdots \\ I_{N_p-1, p_0} & I_{N_p-1, p_1} & \ldots & I_{N_p-1, p_{N_q-1}} \end{bmatrix}\end{split}\]

where \((p_0, p_1, \ldots, p_{N_q - 1})\) are the pixel indices corresponding to mask \(Q\). Each element of the matrix \(\vec A\) is a scalar.

The outer product of \(\vec A\) is:

\[\begin{split}\vec{A} \vec{A}^T = \begin{bmatrix} \displaystyle\sum_p I_{0, p}^2 & \displaystyle\sum_p I_{0, p} I_{1, p} & \ldots & \displaystyle\sum_p I_{0, p} I_{N_t-1, p} \\ \displaystyle\sum_p I_{1, p} I_{0, p} & \displaystyle\sum_p I_{1, p}^2 & \ldots & \displaystyle\sum_p I_{1, p} I_{N_t-1, p} \\ \vdots & \vdots & \ddots & \vdots \\ \displaystyle\sum_p I_{N_t-1, p} I_{0, p} & \displaystyle\sum_p I_{N_t-1, p} I_(1, p) & \ldots & \displaystyle\sum_p I_{N_t-1, p}^2 \end{bmatrix}\end{split}\]

where \(\sum_p\) denotes the summation over indices \((p_0, \ldots, p_{N_p-1})\). This outer product has a shape (Nt, Nt) and takes \(N_{\text{ops}}^{\text{matmul 1 }} = N_p \times N_t^2\) operations to be computed.

The time averaging is then done on summing along diagonals : \(g_2(q, 0)\) is obtained by summing the main diagonal (\(N_t\) elements), \(g_2(q, 1)\) is obtained by summing the diagonal of rank \(1\) (\(N_t - 1\) elements) and so on. It takes \(N_{\text{ops}}^{\text{matmul 2}} = N_t ( N_t + 1)/2\) operations.

In total, the number of operations needed to compute the correlation with the matrix multiplication method is

\[N_{\text{ops}}^{\text{matmul}} = N_{\text{ops}}^{\text{matmul 1}} + N_{\text{ops}}^{\text{matmul 2}} = N_t^2 (N_p + 1/2)\]

2.2 - Fourier Dense correlator

The sum of the main diagonal of \(\vec A \vec{A}^T\) in previous section is equal to

\[n_2(q, 0) = \sum_{t=0}^{N_t - 1} \sum_p I_{t, p}^2\]

In the same way, the sum of rank-\(k\) diagonal of \(\vec A \vec{A}^T\) is equal to

\[n_2(q, k) = \sum_{t=k}^{N_t - 1} \sum_p I_{t,\, p} \, I_{t-k,\, p}\]

The summations over \(t\) and \(p\) can be exchanged. Lets define the matrix \(B\) as

\[\begin{split}\vec B = \begin{bmatrix} I_{0, p}^2 & I_{0, p} I_{1, p} & \ldots & I_{0, p} I_{N_t-1, p} \\ I_{1, p} I_{0, p} & I_{1, p}^2 & \ldots & I_{1, p} I_{N_t-1, p} \\ \vdots & \vdots & \ddots & \vdots \\ I_{N_t-1, p} I_{0, p} & I_{N_t-1, p} I_(1, p) & \ldots & I_{N_t-1, p}^2 \end{bmatrix}\end{split}\]

where \(I_{t, p}\) is the 1D array of \(N_q\) elements made of pixels values, in frame \(t\), belonging to the current \(q\) vector. Matrix \(\vec B\) has a shape \((N_t, N_q \times N_p)\). Summing the main diagonal of \(\vec B\), and then summing over the pixels \(p \in Q\), gives the numerator of \(g_2(q, 0)\) ; and so on for the rank-\(k\) diagonals.

Building \(\vec B\) explicitly is of course prohibitive in terms of memory. Doing the analogy with sections 1.2 and 1.3, the Fourier transform of the series

\[\left[ I_{0, p}, \, I_{1, p}, \, \ldots , \, I_{N_t - 1, p} \right]\]

yields, when multiplied with the Fourier transform of this reversed series and inverse transformed, the quantity

\[\sum_{t=k}^{N_t - 1} \, I_{t, \, p}\, I_{t-k,\, p}\]

which, once summed over \(p \in Q\), gives the numerator of \(g_2(q, \tau)\). Of course this series is not a “sample series” anymore, but an “array series” ; but the FFT can be computed along a specific axis.

In short, the Fourier dense correlator consists in computing the Fourier transform of the frame stack along the time axis (axis=0), the same Fourier transform in reverse order, multiplying both, inverse Fourier transforming the result, and summing over \(p \in Q\).

The important implementation detail is the zero-padding necessary for FFT to fulfil the Fourier convolution theorem. The FFT has to be computed on a size \(N_2 \geq 2N_t\).

The FFt of frames stack along time axis is a batch of \(N_p\) FFT of size \(2 N_t\), with a theoretical complexity \(N_p \times 2 N_t \log (2 N_t)\). The same holds for the FFT of frames stack along reversed time axis and the inverse FFT of the product. In total, the number of operations needed for the two FFT and one IFFT is at least \(N_{\text{ops}}^{\text{Fourier 1}} = 3 \times N_p \times 2 N_t \log (2 N_t)\). The product in Fourier domain takes \(N_{\text{ops}}^{\text{Fourier 2}} = N_p \times N_t\) operations (Real-to-Complex FFT halves the size of the input array). In total, the number of operations for the Fourier method is at least

\[N_{\text{ops}}^{\text{Fourier}} = N_{\text{ops}}^{\text{Fourier 1}} + N_{\text{ops}}^{\text{Fourier 2}} = N_p N_t ( 1 + 6 \log (2 N_t))\]

This number is actually higher when \(N_t\) is not a power of two. In this case, the minimum number of points of the Fourier transform (\(2 N_t\)) has to be padded up to the next power of two. Nevertheless, the next power of two of \(2 N_t\) is always less than \(4 N_t\) (sometimes equal), so the number of operations is still in the order of \(N_p N_t\).

The Fourier method is therefore much more computationally efficient that the algebraic method, at least theoretically. However, observations on various datasets sizes suggest that practical performances of the Fourier method fall short after highly-optimized BLAS implementations of the algebraic method.

The Fourier method is also much more memory demanding. The Fourier transform of the stack of frames needs to store \(N_p \times ((2 N_t) /2 +1)\) complex scalars, so at least \(8 N_p (N_t + 1)\) bytes. Two such arrays are needed. In total, the Fourier method needs \(24 N_p N_t + 16 N_p\) bytes of memory. The algebraic method, on the other hand, takes \(4 N_t (N_p + N_t)\) bytes of memory.

3. Event correlator

When the XPCS data is sparse, i.e has relatively few non-zero values, the classical approaches for computing the correlation are inefficient as many zeros are multiplied/summed. The term “events” is borrowed from the legacy XPCS software. In this context, it means “compacted data”.

For sparse datasets, a more compact data structure is used for the computations. This data structure consists in three arrays: (events, times, offsets):

  • events is simply the concatenation of all non-zero values of the dataset. This concatenation is done in a specific order described below.

  • times contains the corresponding “times” (indices along axis 0) in the “frames” dataset.

  • offsets is an array such that for a pixel location (x, y), events[offsets[k]:offsets[k+1]] contains all the nonzero values of frames[:, y, x]

This data structure is therefore very similar to the Compressed Sparse Rows (CSR) data structure in two dimensions.