HDF5 XRF Stack

There is a recurrent cuestion concerning how one should write the spectra associated to a raster experiment to be compatible with PyMca.

The solution is not unique, because PyMca can deal with (too) many data formats. For instance, if you have a map of 20000 spectra corresponding to a map of 100 rows per 200 columns, 20000 single-column ASCII files containing the measured counts would do the job. You would be “compatible” with PyMca but you would be missing relevant information known at acquisition time like live_time and calibration parameters.

The most versatile file format supported by PyMca is without doubt HDF5. You can find information about it at the HDF Group web site

Let’s assume data is a 3-dimensional array or 20000 spectra corresponding to a raster scan of 100 rows per 200 columns. If each spectrum has 2048 channels, the shape of that array will be (following C-convention) (100, 200, 2048). The simplest HDF5 file compatible with PyMca would contain a single 3-dimensional dataset and it could be written using the code snipset shown below.

import h5py
h5 = h5py.File("myfile.h5", "w")
h5["data"] = data

Obviously, besides a faster readout of the data by PyMca, one would not gain any information compared to the use of single-column ASCII files.

PyMca will automatically look for information associated to a dataset provided that information is stored within the same container group in the file.

If live_time is a one dimensional dataset with 20000 values corresponding to the actual measuring time associated to each spectrum, the simplest way to allow PyMca to use that information is to put it at the same level within the same group.

If the channels associated to the data are different from 0,1,2,3, …, 2046, 2047, they can be specified by a one dimensional dataset named channels.

The calibration can be specified as a dataset containing three values (corresponding to a, b and c in the expression energy = a + b * ch + c * ch^2 and named calibration.

import h5py
h5 = h5py.File("myfile.h5", "w")
h5["/mca_0/data"] = data
h5["/mca_0/channels"] = channels
h5["/mca_0/calibration"] = calibration
h5["/mca_0/live_time"] = live_time

Additional conventions can be applied to improve the user experience when using the PyMca graphical user interface.

The code below writes an HDF5 following NeXus conventions. Those conventions are attribute based, therefore the actual names of the different groups are free. You can download a script generating a file using these conventions.

h5File = "myfile.h5"
if os.path.exists(h5File):
h5 = h5py.File(h5File, "w")
h5["/entry/instrument/detector/calibration"] = calibration
h5["/entry/instrument/detector/channels"] = channels
h5["/entry/instrument/detector/data"] = data
h5["/entry/instrument/detector/live_time"] = live_time

# add nexus conventions (not needed)
h5["/entry/title"] = u"Dummy generated map"
h5["/entry"].attrs["NX_class"] = u"NXentry"
h5["/entry/instrument"].attrs["NX_class"] = u"NXinstrument"
h5["/entry/instrument/detector/"].attrs["NX_class"] = u"NXdetector"
h5["/entry/instrument/detector/data"].attrs["interpretation"] = u"spectrum"
# implement a default plot named measurement (not needed)
h5["/entry/measurement/data"] = h5py.SoftLink("/entry/instrument/detector/data")
h5["/entry/measurement"].attrs["NX_class"] = u"NXdata"
h5["/entry/measurement"].attrs["signal"] = u"data"
h5["/entry"].attrs["default"] = u"measurement"