spech5: h5py-like API to SpecFile

This module provides a h5py-like API to access SpecFile data.

API description

Specfile data structure exposed by this API:

/
    1.1/
        title = "…"
        start_time = "…"
        instrument/
            specfile/
                file_header = "…"
                scan_header = "…"
            positioners/
                motor_name = value
                …
            mca_0/
                data = …
                calibration = …
                channels = …
                preset_time = …
                elapsed_time = …
                live_time = …

            mca_1/
                …
            …
        measurement/
            colname0 = …
            colname1 = …
            …
            mca_0/
                 data -> /1.1/instrument/mca_0/data
                 info -> /1.1/instrument/mca_0/
            …
        sample/
            ub_matrix = …
            unit_cell = …
            unit_cell_abc = …
            unit_cell_alphabetagamma = …
    2.1/
        …

file_header and scan_header are the raw headers as they appear in the original file, as a string of lines separated by newline (\n) characters.

The title is the content of the #S scan header line without the leading #S and without the scan number (e.g "ascan  ss1vo -4.55687 -0.556875  40 0.2").

The start time is converted to ISO8601 format ("2016-02-23T22:49:05Z"), if the original date format is standard.

Numeric datasets are stored in float32 format, except for scalar integers which are stored as int64.

Motor positions (e.g. /1.1/instrument/positioners/motor_name) can be 1D numpy arrays if they are measured as scan data, or else scalars as defined on #P scan header lines. A simple test is done to check if the motor name is also a data column header defined in the #L scan header line.

Scan data (e.g. /1.1/measurement/colname0) is accessed by column, the dataset name colname0 being the column label as defined in the #L scan header line.

If a / character is present in a column label or in a motor name in the original SPEC file, it will be substituted with a % character in the corresponding dataset name.

MCA data is exposed as a 2D numpy array containing all spectra for a given analyser. The number of analysers is calculated as the number of MCA spectra per scan data line. Demultiplexing is then performed to assign the correct spectra to a given analyser.

MCA calibration is an array of 3 scalars, from the #@CALIB header line. It is identical for all MCA analysers, as there can be only one #@CALIB line per scan.

MCA channels is an array containing all channel numbers. This information is computed from the #@CHANN scan header line (if present), or computed from the shape of the first spectrum in a scan ([0, len(first_spectrum] - 1]).

Accessing data

Data and groups are accessed in h5py fashion:

from silx.io.spech5 import SpecH5

# Open a SpecFile
sfh5 = SpecH5("test.dat")

# using SpecH5 as a regular group to access scans
scan1group = sfh5["1.1"]
instrument_group = scan1group["instrument"]

# alternative: full path access
measurement_group = sfh5["/1.1/measurement"]

# accessing a scan data column by name as a 1D numpy array
data_array = measurement_group["Pslit HGap"]

# accessing all mca-spectra for one MCA device
mca_0_spectra = measurement_group["mca_0/data"]

SpecH5 files and groups provide a keys() method:

>>> sfh5.keys()
['96.1', '97.1', '98.1']
>>> sfh5['96.1'].keys()
['title', 'start_time', 'instrument', 'measurement']

They can also be treated as iterators:

from silx.io import is_dataset

for scan_group in SpecH5("test.dat"):
    dataset_names = [item.name in scan_group["measurement"] if
                     is_dataset(item)]
    print("Found data columns in scan " + scan_group.name)
    print(", ".join(dataset_names))

You can test for existence of data or groups:

>>> "/1.1/measurement/Pslit HGap" in sfh5
True
>>> "positioners" in sfh5["/2.1/instrument"]
True
>>> "spam" in sfh5["1.1"]
False

Note

Text used to be stored with a dtype numpy.string_ in silx versions prior to 0.7.0. The type numpy.string_ is a byte-string format. The consequence of this is that you had to decode strings before using them in Python 3:

>>> from silx.io.spech5 import SpecH5
>>> sfh5 = SpecH5("31oct98.dat")
>>> sfh5["/68.1/title"]
b'68  ascan  tx3 -28.5 -24.5  20 0.5'
>>> sfh5["/68.1/title"].decode()
'68  ascan  tx3 -28.5 -24.5  20 0.5'

From silx version 0.7.0 onwards, text is now stored as unicode. This corresponds to the default text type in python 3, and to the unicode type in Python 2.

To be on the safe side, you can test for the presence of a decode attribute, to ensure that you always work with unicode text:

>>> title = sfh5["/68.1/title"]
>>> if hasattr(title, "decode"):
...     title = title.decode()

Classes

class SpecH5(filename)[source]

Bases: silx.io.commonh5.File, silx.io.spech5.SpecH5Group

This class opens a SPEC file and exposes it as a h5py.File.

It inherits silx.io.commonh5.Group (via commonh5.File), which implements most of its API.

close()[source]

Close the object, and free up associated resources.

__contains__(name)

Returns true if name is an existing child of this group.

Return type:bool
__enter__()
__exit__(exc_type, exc_val, exc_tb)
__getitem__(name)

Return a child from his name.

Parameters:name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree.
Return type:Node
__iter__()

Iterate over member names

__len__()

Returns the number of children contained in this group.

Return type:int
attrs

Returns HDF5 attributes of this node.

Return type:dict
basename

Returns the HDF5 basename of this node.

create_dataset(name, shape=None, dtype=None, data=None, **kwds)

Create and return a sub dataset.

Parameters:
  • name (str) – Name of the dataset.
  • shape – Dataset shape. Use “()” for scalar datasets. Required if “data” isn’t provided.
  • dtype – Numpy dtype or string. If omitted, dtype(‘f’) will be used. Required if “data” isn’t provided; otherwise, overrides data array’s dtype.
  • data (numpy.ndarray) – Provide data to initialize the dataset. If used, you can omit shape and dtype arguments.
  • kwds – Extra arguments. Nothing yet supported.
create_group(name)

Create and return a new subgroup.

Name may be absolute or relative. Fails if the target name already exists.

Parameters:name (str) – Name of the new group
file

Returns the file node of this node.

Return type:Node
filename
get(name, default=None, getclass=False, getlink=False)

Retrieve an item or other information.

If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.

Parameters:
  • name (str) – name of the item
  • default (object) – default value returned if the name is not found
  • getclass (bool) – if true, the returned object is the class of the object found
  • getlink (bool) – if true, links object are returned instead of the target
Returns:

An object, else None

Return type:

object

h5_class

Returns the h5py.File class

h5py_class

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:Class
items()

Returns items iterator containing name-node mapping.

Return type:iterator
keys()

Returns an iterator over the children’s names in a group.

mode
name

Returns the HDF5 name of this node.

parent

Returns the parent of the node.

Return type:Node
values()

Returns an iterator over the children nodes (groups and datasets) in a group.

New in version 0.6.

visit(func, visit_links=False)

Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.

Parameters:func (callable) – Callable (function, method or callable object)
visititems(func, visit_links=False)

Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.

Parameters:
  • func (callable) – Callable (function, method or callable object)
  • visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.
class SpecH5Group[source]

Bases: object

This convenience class is to be inherited by all groups, for compatibility purposes with code that tests for isinstance(obj, SpecH5Group).

This legacy behavior is deprecated. The correct way to test if an object is a group is to use silx.io.utils.is_group().

Groups must also inherit silx.io.commonh5.Group, which actually implements all the methods and attributes.

class Group(name, parent=None, attrs=None)[source]

Bases: silx.io.commonh5.Node

This class mimics a h5py.Group.

get(name, default=None, getclass=False, getlink=False)[source]

Retrieve an item or other information.

If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.

Parameters:
  • name (str) – name of the item
  • default (object) – default value returned if the name is not found
  • getclass (bool) – if true, the returned object is the class of the object found
  • getlink (bool) – if true, links object are returned instead of the target
Returns:

An object, else None

Return type:

object

__getitem__(name)[source]

Return a child from his name.

Parameters:name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree.
Return type:Node
__contains__(name)[source]

Returns true if name is an existing child of this group.

Return type:bool
__len__()[source]

Returns the number of children contained in this group.

Return type:int
__iter__()[source]

Iterate over member names

keys()[source]

Returns an iterator over the children’s names in a group.

values()[source]

Returns an iterator over the children nodes (groups and datasets) in a group.

New in version 0.6.

items()[source]

Returns items iterator containing name-node mapping.

Return type:iterator
visit(func, visit_links=False)[source]

Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.

Parameters:func (callable) – Callable (function, method or callable object)
visititems(func, visit_links=False)[source]

Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.

Parameters:
  • func (callable) – Callable (function, method or callable object)
  • visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.
attrs

Returns HDF5 attributes of this node.

Return type:dict
basename

Returns the HDF5 basename of this node.

file

Returns the file node of this node.

Return type:Node
h5py_class

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:Class
name

Returns the HDF5 name of this node.

parent

Returns the parent of the node.

Return type:Node
class SpecH5Dataset[source]

Bases: object

This convenience class is to be inherited by all datasets, for compatibility purpose with code that tests for isinstance(obj, SpecH5Dataset).

This legacy behavior is deprecated. The correct way to test if an object is a dataset is to use silx.io.utils.is_dataset().

Datasets must also inherit SpecH5NodeDataset or SpecH5LazyNodeDataset which actually implement all the API.

class SpecH5NodeDataset(name, data, parent=None, attrs=None)[source]

Bases: silx.io.commonh5.Dataset, silx.io.spech5.SpecH5Dataset

This class inherits commonh5.Dataset, to which it adds little extra functionality. The main additional functionality is the proxy behavior that allows to mimic the numpy array stored in this class.

__getattr__(item)[source]

Proxy to underlying numpy array methods.

__getitem__(item)

Returns the slice of the data exposed by this dataset.

Return type:numpy.ndarray
__iter__()

Iterate over the first axis. TypeError if scalar.

__len__()

Returns the size of the data exposed by this dataset.

Return type:int
attrs

Returns HDF5 attributes of this node.

Return type:dict
basename

Returns the HDF5 basename of this node.

chunks

Returns chunks as provided by h5py.Dataset.

There is no chunks.

compression

Returns compression as provided by h5py.Dataset.

There is no compression.

compression_opts

Returns compression options as provided by h5py.Dataset.

There is no compression.

dtype

Returns the numpy datatype exposed by this dataset.

Return type:numpy.dtype
external

Returns external sources as provided by h5py.Dataset.

Return type:list or None
file

Returns the file node of this node.

Return type:Node
h5_class

Returns the HDF5 class which is mimicked by this class.

Return type:H5Type
h5py_class

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:Class
is_virtual

Checks virtual data as provided by h5py.Dataset

name

Returns the HDF5 name of this node.

parent

Returns the parent of the node.

Return type:Node
shape

Returns the shape of the data exposed by this dataset.

Return type:tuple
size

Returns the size of the data exposed by this dataset.

Return type:int
value

Returns the data exposed by this dataset.

Deprecated by h5py. It is prefered to use indexing [()].

Return type:numpy.ndarray
virtual_sources()

Returns virtual dataset sources as provided by h5py.Dataset.

Return type:list