`spech5`: h5py-like API to SpecFile¶

This module provides a h5py-like API to access SpecFile data.

API description¶

Specfile data structure exposed by this API:

/
    1.1/
        title = "…"
        start_time = "…"
        instrument/
            specfile/
                file_header = "…"
                scan_header = "…"
            positioners/
                motor_name = value
                …
            mca_0/
                data = …
                calibration = …
                channels = …
                preset_time = …
                elapsed_time = …
                live_time = …

            mca_1/
                …
            …
        measurement/
            colname0 = …
            colname1 = …
            …
            mca_0/
                 data -> /1.1/instrument/mca_0/data
                 info -> /1.1/instrument/mca_0/
            …
        sample/
            ub_matrix = …
            unit_cell = …
            unit_cell_abc = …
            unit_cell_alphabetagamma = …
    2.1/
        …

file_header and scan_header are the raw headers as they appear in the original file, as a string of lines separated by newline (\n) characters.

The title is the content of the #S scan header line without the leading #S and without the scan number (e.g "ascan ss1vo -4.55687 -0.556875 40 0.2").

The start time is converted to ISO8601 format ("2016-02-23T22:49:05Z"), if the original date format is standard.

Numeric datasets are stored in float32 format, except for scalar integers which are stored as int64.

Motor positions (e.g. /1.1/instrument/positioners/motor_name) can be 1D numpy arrays if they are measured as scan data, or else scalars as defined on #P scan header lines. A simple test is done to check if the motor name is also a data column header defined in the #L scan header line.

Scan data (e.g. /1.1/measurement/colname0) is accessed by column, the dataset name colname0 being the column label as defined in the #L scan header line.

If a / character is present in a column label or in a motor name in the original SPEC file, it will be substituted with a % character in the corresponding dataset name.

MCA data is exposed as a 2D numpy array containing all spectra for a given analyser. The number of analysers is calculated as the number of MCA spectra per scan data line. Demultiplexing is then performed to assign the correct spectra to a given analyser.

MCA calibration is an array of 3 scalars, from the #@CALIB header line. It is identical for all MCA analysers, as there can be only one #@CALIB line per scan.

MCA channels is an array containing all channel numbers. This information is computed from the #@CHANN scan header line (if present), or computed from the shape of the first spectrum in a scan ([0, … len(first_spectrum] - 1]).

Accessing data¶

Data and groups are accessed in h5py fashion:

from silx.io.spech5 import SpecH5

# Open a SpecFile
sfh5 = SpecH5("test.dat")

# using SpecH5 as a regular group to access scans
scan1group = sfh5["1.1"]
instrument_group = scan1group["instrument"]

# alternative: full path access
measurement_group = sfh5["/1.1/measurement"]

# accessing a scan data column by name as a 1D numpy array
data_array = measurement_group["Pslit HGap"]

# accessing all mca-spectra for one MCA device
mca_0_spectra = measurement_group["mca_0/data"]

SpecH5 files and groups provide a keys() method:

>>> sfh5.keys()
['96.1', '97.1', '98.1']
>>> sfh5['96.1'].keys()
['title', 'start_time', 'instrument', 'measurement']

They can also be treated as iterators:

from silx.io import is_dataset

for scan_group in SpecH5("test.dat"):
    dataset_names = [item.name in scan_group["measurement"] if
                     is_dataset(item)]
    print("Found data columns in scan " + scan_group.name)
    print(", ".join(dataset_names))

You can test for existence of data or groups:

>>> "/1.1/measurement/Pslit HGap" in sfh5
True
>>> "positioners" in sfh5["/2.1/instrument"]
True
>>> "spam" in sfh5["1.1"]
False

Note

Text used to be stored with a dtype numpy.string_ in silx versions prior to 0.7.0. The type numpy.string_ is a byte-string format. The consequence of this is that you had to decode strings before using them in Python 3:

>>> from silx.io.spech5 import SpecH5
>>> sfh5 = SpecH5("31oct98.dat")
>>> sfh5["/68.1/title"]
b'68  ascan  tx3 -28.5 -24.5  20 0.5'
>>> sfh5["/68.1/title"].decode()
'68  ascan  tx3 -28.5 -24.5  20 0.5'

From silx version 0.7.0 onwards, text is now stored as unicode. This corresponds to the default text type in python 3, and to the unicode type in Python 2.

To be on the safe side, you can test for the presence of a decode attribute, to ensure that you always work with unicode text:

>>> title = sfh5["/68.1/title"]
>>> if hasattr(title, "decode"):
...     title = title.decode()

class SpecH5(filename)[source]¶

Bases: silx.io.commonh5.File, silx.io.spech5.SpecH5Group

This class opens a SPEC file and exposes it as a h5py.File.

It inherits silx.io.commonh5.Group (via commonh5.File), which implements most of its API.

close()[source]¶: Close the object, and free up associated resources.

__contains__(name)¶

Returns true if name is an existing child of this group.

Return type:	bool

__enter__()¶

__exit__(exc_type, exc_val, exc_tb)¶

__getitem__(name)¶

Return a child from his name.

Parameters:	name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree.
Return type:	Node

__iter__()¶: Iterate over member names

__len__()¶

Returns the number of children contained in this group.

Return type:	int

attrs¶

Returns HDF5 attributes of this node.

Return type:	dict

basename¶: Returns the HDF5 basename of this node.

create_dataset(name, shape=None, dtype=None, data=None, **kwds)¶

Create and return a sub dataset.

Parameters:

name (str) – Name of the dataset.
shape – Dataset shape. Use “()” for scalar datasets. Required if “data” isn’t provided.
dtype – Numpy dtype or string. If omitted, dtype(‘f’) will be used. Required if “data” isn’t provided; otherwise, overrides data array’s dtype.
data (numpy.ndarray) – Provide data to initialize the dataset. If used, you can omit shape and dtype arguments.
kwds – Extra arguments. Nothing yet supported.

create_group(name)¶

Create and return a new subgroup.

Name may be absolute or relative. Fails if the target name already exists.

Parameters:	name (str) – Name of the new group

file¶

Returns the file node of this node.

Return type:	Node

filename¶

get(name, default=None, getclass=False, getlink=False)¶

Retrieve an item or other information.

If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.

Parameters:	name (str) – name of the item default (object) – default value returned if the name is not found getclass (bool) – if true, the returned object is the class of the object found getlink (bool) – if true, links object are returned instead of the target
Returns:	An object, else None
Return type:	object

h5_class¶: Returns the h5py.File class

h5py_class¶

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:	Class

items()¶

Returns items iterator containing name-node mapping.

Return type:	iterator

keys()¶: Returns an iterator over the children’s names in a group.

mode¶

name¶: Returns the HDF5 name of this node.

parent¶

Returns the parent of the node.

Return type:	Node

values()¶: Returns an iterator over the children nodes (groups and datasets) in a group.

New in version 0.6.

visit(func, visit_links=False)¶

Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.

Parameters:	func (callable) – Callable (function, method or callable object)

visititems(func, visit_links=False)¶

Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.

Parameters:	func (callable) – Callable (function, method or callable object) visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.

class SpecH5Group[source]¶

Bases: object

This convenience class is to be inherited by all groups, for compatibility purposes with code that tests for isinstance(obj, SpecH5Group).

This legacy behavior is deprecated. The correct way to test if an object is a group is to use silx.io.utils.is_group().

Groups must also inherit silx.io.commonh5.Group, which actually implements all the methods and attributes.

class Group(name, parent=None, attrs=None)[source]¶

Bases: silx.io.commonh5.Node

This class mimics a h5py.Group.

get(name, default=None, getclass=False, getlink=False)[source]¶

Retrieve an item or other information.

If getlink only is true, the returned value is always h5py.HardLink, because this implementation do not use links. Like the original implementation.

Parameters:	name (str) – name of the item default (object) – default value returned if the name is not found getclass (bool) – if true, the returned object is the class of the object found getlink (bool) – if true, links object are returned instead of the target
Returns:	An object, else None
Return type:	object

__getitem__(name)[source]¶

Return a child from his name.

Parameters:	name (str) – name of a member or a path throug members using ‘/’ separator. A ‘/’ as a prefix access to the root item of the tree.
Return type:	Node

__contains__(name)[source]¶

Returns true if name is an existing child of this group.

Return type:	bool

__len__()[source]¶

Returns the number of children contained in this group.

Return type:	int

__iter__()[source]¶: Iterate over member names

keys()[source]¶: Returns an iterator over the children’s names in a group.

values()[source]¶: Returns an iterator over the children nodes (groups and datasets) in a group.

New in version 0.6.

items()[source]¶

Returns items iterator containing name-node mapping.

Return type:	iterator

visit(func, visit_links=False)[source]¶

Recursively visit all names in this group and subgroups. See the documentation for h5py.Group.visit for more help.

Parameters:	func (callable) – Callable (function, method or callable object)

visititems(func, visit_links=False)[source]¶

Recursively visit names and objects in this group. See the documentation for h5py.Group.visititems for more help.

Parameters:	func (callable) – Callable (function, method or callable object) visit_links (bool) – If False, ignore links. If True, call func(name) for links and recurse into target groups.

attrs¶

Returns HDF5 attributes of this node.

Return type:	dict

basename¶: Returns the HDF5 basename of this node.

file¶

Returns the file node of this node.

Return type:	Node

h5py_class¶

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:	Class

name¶: Returns the HDF5 name of this node.

parent¶

Returns the parent of the node.

Return type:	Node

class SpecH5Dataset[source]¶

Bases: object

This convenience class is to be inherited by all datasets, for compatibility purpose with code that tests for isinstance(obj, SpecH5Dataset).

This legacy behavior is deprecated. The correct way to test if an object is a dataset is to use silx.io.utils.is_dataset().

Datasets must also inherit SpecH5NodeDataset or SpecH5LazyNodeDataset which actually implement all the API.

class SpecH5NodeDataset(name, data, parent=None, attrs=None)[source]¶

Bases: silx.io.commonh5.Dataset, silx.io.spech5.SpecH5Dataset

This class inherits commonh5.Dataset, to which it adds little extra functionality. The main additional functionality is the proxy behavior that allows to mimic the numpy array stored in this class.

__getattr__(item)[source]¶: Proxy to underlying numpy array methods.

__getitem__(item)¶

Returns the slice of the data exposed by this dataset.

Return type:	numpy.ndarray

__iter__()¶: Iterate over the first axis. TypeError if scalar.

__len__()¶

Returns the size of the data exposed by this dataset.

Return type:	int

attrs¶

Returns HDF5 attributes of this node.

Return type:	dict

basename¶: Returns the HDF5 basename of this node.

chunks¶

Returns chunks as provided by h5py.Dataset.

There is no chunks.

compression¶

Returns compression as provided by h5py.Dataset.

There is no compression.

compression_opts¶

Returns compression options as provided by h5py.Dataset.

There is no compression.

dtype¶

Returns the numpy datatype exposed by this dataset.

Return type:	numpy.dtype

external¶

Returns external sources as provided by h5py.Dataset.

Return type:	list or None

file¶

Returns the file node of this node.

Return type:	Node

h5_class¶

Returns the HDF5 class which is mimicked by this class.

Return type:	H5Type

h5py_class¶

Returns the h5py classes which is mimicked by this class. It can be one of h5py.File, h5py.Group or h5py.Dataset

This should not be used anymore. Prefer using h5_class

Return type:	Class

is_virtual¶: Checks virtual data as provided by h5py.Dataset

name¶: Returns the HDF5 name of this node.

parent¶

Returns the parent of the node.

Return type:	Node

shape¶

Returns the shape of the data exposed by this dataset.

Return type:	tuple

size¶

Returns the size of the data exposed by this dataset.

Return type:	int

value¶

Returns the data exposed by this dataset.

Deprecated by h5py. It is prefered to use indexing [()].

Return type:	numpy.ndarray

virtual_sources()¶

Returns virtual dataset sources as provided by h5py.Dataset.

Return type:	list

`spech5`: h5py-like API to SpecFile¶

API description¶

Accessing data¶

Classes¶

Table of Contents

Previous topic

Next topic

This Page

spech5: h5py-like API to SpecFile¶

API description¶

Accessing data¶

Classes¶

`spech5`: h5py-like API to SpecFile¶