spech5: h5py-like API to SpecFile#

This module provides a h5py-like API to access SpecFile data.

API description#

Specfile data structure exposed by this API:

/
    1.1/
        title = "…"
        start_time = "…"
        instrument/
            specfile/
                file_header = "…"
                scan_header = "…"
            positioners/
                motor_name = value
                …
            mca_0/
                data = …
                calibration = …
                channels = …
                preset_time = …
                elapsed_time = …
                live_time = …

            mca_1/
                …
            …
        measurement/
            colname0 = …
            colname1 = …
            …
            mca_0/
                 data -> /1.1/instrument/mca_0/data
                 info -> /1.1/instrument/mca_0/
            …
        sample/
            ub_matrix = …
            unit_cell = …
            unit_cell_abc = …
            unit_cell_alphabetagamma = …
    2.1/
        …

file_header and scan_header are the raw headers as they appear in the original file, as a string of lines separated by newline (\n) characters.

The title is the content of the #S scan header line without the leading #S and without the scan number (e.g "ascan  ss1vo -4.55687 -0.556875  40 0.2").

The start time is converted to ISO8601 format ("2016-02-23T22:49:05Z"), if the original date format is standard.

Numeric datasets are stored in float32 format, except for scalar integers which are stored as int64.

Motor positions (e.g. /1.1/instrument/positioners/motor_name) can be 1D numpy arrays if they are measured as scan data, or else scalars as defined on #P scan header lines. A simple test is done to check if the motor name is also a data column header defined in the #L scan header line.

Scan data (e.g. /1.1/measurement/colname0) is accessed by column, the dataset name colname0 being the column label as defined in the #L scan header line.

If a / character is present in a column label or in a motor name in the original SPEC file, it will be substituted with a % character in the corresponding dataset name.

MCA data is exposed as a 2D numpy array containing all spectra for a given analyser. The number of analysers is calculated as the number of MCA spectra per scan data line. Demultiplexing is then performed to assign the correct spectra to a given analyser.

MCA calibration is an array of 3 scalars, from the #@CALIB header line. It is identical for all MCA analysers, as there can be only one #@CALIB line per scan.

MCA channels is an array containing all channel numbers. This information is computed from the #@CHANN scan header line (if present), or computed from the shape of the first spectrum in a scan ([0, len(first_spectrum] - 1]).

Accessing data#

Data and groups are accessed in h5py fashion:

from silx.io.spech5 import SpecH5

# Open a SpecFile
sfh5 = SpecH5("test.dat")

# using SpecH5 as a regular group to access scans
scan1group = sfh5["1.1"]
instrument_group = scan1group["instrument"]

# alternative: full path access
measurement_group = sfh5["/1.1/measurement"]

# accessing a scan data column by name as a 1D numpy array
data_array = measurement_group["Pslit HGap"]

# accessing all mca-spectra for one MCA device
mca_0_spectra = measurement_group["mca_0/data"]

SpecH5 files and groups provide a keys() method:

>>> sfh5.keys()
['96.1', '97.1', '98.1']
>>> sfh5['96.1'].keys()
['title', 'start_time', 'instrument', 'measurement']

They can also be treated as iterators:

from silx.io import is_dataset

for scan_group in SpecH5("test.dat"):
    dataset_names = [item.name in scan_group["measurement"] if
                     is_dataset(item)]
    print("Found data columns in scan " + scan_group.name)
    print(", ".join(dataset_names))

You can test for existence of data or groups:

>>> "/1.1/measurement/Pslit HGap" in sfh5
True
>>> "positioners" in sfh5["/2.1/instrument"]
True
>>> "spam" in sfh5["1.1"]
False

Classes#

class SpecH5(filename)[source]#

Bases: File, SpecH5Group

This class opens a SPEC file and exposes it as a h5py.File.

It inherits silx.io.commonh5.Group (via commonh5.File), which implements most of its API.

close()[source]#

Close the object, and free up associated resources.

__contains__(name)#

Returns true if name is an existing child of this group.

Return type:

bool

__enter__()#
__exit__(exc_type, exc_val, exc_tb)#
__getitem__(name)#

Return a child from his name.

Parameters:

name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree.

Return type:

Node

__iter__()#

Iterate over member names

__len__()#

Returns the number of children contained in this group.

Return type:

int

property attrs#

Returns HDF5 attributes of this node.

Return type:

dict

property basename#

Returns the HDF5 basename of this node.

create_dataset(name, shape=None, dtype=None, data=None, **kwds)#

Create and return a sub dataset.

Parameters:
  • name (str) – Name of the dataset.

  • shape – Dataset shape. Use “()” for scalar datasets. Required if “data” isn’t provided.

  • dtype – Numpy dtype or string. If omitted, dtype(‘f’) will be used. Required if “data” isn’t provided; otherwise, overrides data array’s dtype.

  • data (numpy.ndarray) – Provide data to initialize the dataset. If used, you can omit shape and dtype arguments.

  • kwds – Extra arguments. Nothing yet supported.

create_group(name)#

Create and return a new subgroup.

Name may be absolute or relative. Fails if the target name already exists.

Parameters:

name (str) – Name of the new group

property file#

Returns the file node of this node.

Return type:

Node

property filename#
get(name, default=None, getclass=False, getlink=False)#

Retrieve an item or other information.

If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.

Parameters:
  • name (str) – name of the item

  • default (object) – default value returned if the name is not found

  • getclass (bool) – if true, the returned object is the class of the object found

  • getlink (bool) – if true, links object are returned instead of the target

Returns:

An object, else None

Return type:

object

property h5_class#

Returns the h5py.File class

property h5py_class#

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:

Class

items()#

Returns items iterator containing name-node mapping.

Return type:

iterator

keys()#

Returns an iterator over the children’s names in a group.

property mode#
property name#

Returns the HDF5 name of this node.

property parent#

Returns the parent of the node.

Return type:

Node

values()#

Returns an iterator over the children nodes (groups and datasets) in a group.

New in version 0.6.

visit(func, visit_links=False)#

Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.

Parameters:

func (callable) – Callable (function, method or callable object)

visititems(func, visit_links=False)#

Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.

Parameters:
  • func (callable) – Callable (function, method or callable object)

  • visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.

class SpecH5Group[source]#

Bases: object

This convenience class is to be inherited by all groups, for compatibility purposes with code that tests for isinstance(obj, SpecH5Group).

This legacy behavior is deprecated. The correct way to test if an object is a group is to use silx.io.utils.is_group().

Groups must also inherit silx.io.commonh5.Group, which actually implements all the methods and attributes.

class SpecH5Dataset[source]#

Bases: object

This convenience class is to be inherited by all datasets, for compatibility purpose with code that tests for isinstance(obj, SpecH5Dataset).

This legacy behavior is deprecated. The correct way to test if an object is a dataset is to use silx.io.utils.is_dataset().

Datasets must also inherit SpecH5NodeDataset or SpecH5LazyNodeDataset which actually implement all the API.

class SpecH5NodeDataset(name, data, parent=None, attrs=None)[source]#

Bases: Dataset, SpecH5Dataset

This class inherits commonh5.Dataset, to which it adds little extra functionality. The main additional functionality is the proxy behavior that allows to mimic the numpy array stored in this class.

__getattr__(item)[source]#

Proxy to underlying numpy array methods.

__getitem__(item)#

Returns the slice of the data exposed by this dataset.

Return type:

numpy.ndarray

__iter__()#

Iterate over the first axis. TypeError if scalar.

__len__()#

Returns the size of the data exposed by this dataset.

Return type:

int

property attrs#

Returns HDF5 attributes of this node.

Return type:

dict

property basename#

Returns the HDF5 basename of this node.

property chunks#

Returns chunks as provided by h5py.Dataset.

There is no chunks.

property compression#

Returns compression as provided by h5py.Dataset.

There is no compression.

property compression_opts#

Returns compression options as provided by h5py.Dataset.

There is no compression.

property dtype#

Returns the numpy datatype exposed by this dataset.

Return type:

numpy.dtype

property external#

Returns external sources as provided by h5py.Dataset.

Return type:

list or None

property file#

Returns the file node of this node.

Return type:

Node

property h5_class#

Returns the HDF5 class which is mimicked by this class.

Return type:

H5Type

property h5py_class#

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:

Class

property is_virtual#

Checks virtual data as provided by h5py.Dataset

property name#

Returns the HDF5 name of this node.

property parent#

Returns the parent of the node.

Return type:

Node

property shape#

Returns the shape of the data exposed by this dataset.

Return type:

tuple

property size#

Returns the size of the data exposed by this dataset.

Return type:

int

property value#

Returns the data exposed by this dataset.

Deprecated by h5py. It is prefered to use indexing [()].

Return type:

numpy.ndarray

virtual_sources()#

Returns virtual dataset sources as provided by h5py.Dataset.

Return type:

list