utils: I/O utilities#

I/O utility functions

NEXUS_HDF5_EXT = ['.h5', '.nx5', '.nxs', '.hdf', '.hdf5', '.cxi']#

List of possible extensions for HDF5 file formats.

class H5Type(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Identify a set of HDF5 concepts

supported_extensions(flat_formats=True)[source]#

Returns the list file extensions supported by silx.open.

The result filter out formats when the expected module is not available.

Parameters:

flat_formats (bool) – If true, also include flat formats like npy or edf (while the expected module is available)

Returns:

A dictionary indexed by file description and containing a set of extensions (an extension is a string like “*.ext”).

Return type:

Dict[str, Set[str]]

save1D(fname, x, y, xlabel=None, ylabels=None, filetype=None, fmt='%.7g', csvdelim=';', newline='\n', header='', footer='', comments='#', autoheader=False)[source]#

Saves any number of curves to various formats: Specfile, CSV, txt or npy. All curves must have the same number of points and share the same x values.

Parameters:
  • fname – Output file path, or file handle open in write mode. If fname is a path, file is opened in w mode. Existing file with a same name will be overwritten.

  • x – 1D-Array (or list) of abscissa values.

  • y – 2D-array (or list of lists) of ordinates values. First index is the curve index, second index is the sample index. The length of the second dimension (number of samples) must be equal to len(x). y can be a 1D-array in case there is only one curve to be saved.

  • filetype – Filetype: "spec", "csv", "txt", "ndarray". If None, filetype is detected from file name extension (.dat, .csv, .txt, .npy).

  • xlabel – Abscissa label

  • ylabels – List of y labels

  • fmt – Format string for data. You can specify a short format string that defines a single format for both x and y values, or a list of two different format strings (e.g. ["%d", "%.7g"]). Default is "%.7g". This parameter does not apply to the npy format.

  • csvdelim – String or character separating columns in txt and CSV formats. The user is responsible for ensuring that this delimiter is not used in data labels when writing a CSV file.

  • newline – String or character separating lines/records in txt format (default is line break character \n).

  • header – String that will be written at the beginning of the file in txt format.

  • footer – String that will be written at the end of the file in txt format.

  • comments – String that will be prepended to the header and footer strings, to mark them as comments. Default: #.

  • autoheader – In CSV or txt, True causes the first header line to be written as a standard CSV header line with column labels separated by the specified CSV delimiter.

When saving to Specfile format, each curve is saved as a separate scan with two data columns (x and y).

CSV and txt formats are similar, except that the txt format allows user defined header and footer text blocks, whereas the CSV format has only a single header line with columns labels separated by field delimiters and no footer. The txt format also allows defining a record separator different from a line break.

The npy format is written with numpy.save and can be read back with numpy.load. If xlabel and ylabels are undefined, data is saved as a regular 2D numpy.ndarray (contatenation of x and y). If both xlabel and ylabels are defined, the data is saved as a numpy.recarray after being transposed and having labels assigned to columns.

savetxt(fname, X, fmt='%.7g', delimiter=';', newline='\n', header='', footer='', comments='#')[source]#

numpy.savetxt backport of header and footer arguments from numpy=1.7.0.

See numpy.savetxt help: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.savetxt.html

savespec(specfile, x, y, xlabel='X', ylabel='Y', fmt='%.7g', scan_number=1, mode='w', write_file_header=True, close_file=False)[source]#

Saves one curve to a SpecFile.

The curve is saved as a scan with two data columns. To save multiple curves to a single SpecFile, call this function for each curve by providing the same file handle each time.

Parameters:
  • specfile – Output SpecFile name, or file handle open in write or append mode. If a file name is provided, a new file is open in write mode (existing file with the same name will be lost)

  • x – 1D-Array (or list) of abscissa values

  • y – 1D-array (or list), or list of them of ordinates values. All dataset must have the same length as x

  • xlabel – Abscissa label (default "X")

  • ylabel – Ordinate label, may be a list of labels when multiple curves are to be saved together.

  • fmt – Format string for data. You can specify a short format string that defines a single format for both x and y values, or a list of two different format strings (e.g. ["%d", "%.7g"]). Default is "%.7g".

  • scan_number – Scan number (default 1).

  • mode – Mode for opening file: w (default), a, r+, w+, a+. This parameter is only relevant if specfile is a path.

  • write_file_header – If True, write a file header before writing the scan (#F and #D line).

  • close_file – If True, close the file after saving curve.

Returns:

None if close_file is True, else return the file handle.

h5ls(h5group, lvl=0)[source]#

Return a simple string representation of a HDF5 tree structure.

Parameters:
  • h5group – Any h5py.Group or h5py.File instance, or a HDF5 file name

  • lvl – Number of tabulations added to the group. lvl is incremented as we recursively process sub-groups.

Returns:

String representation of an HDF5 tree structure

Group names and dataset representation are printed preceded by a number of tabulations corresponding to their depth in the tree structure. Datasets are represented as h5py.Dataset objects.

Example:

>>> print(h5ls("Downloads/sample.h5"))
+fields
    +fieldB
        <HDF5 dataset "z": shape (256, 256), type "<f4">
    +fieldE
        <HDF5 dataset "x": shape (256, 256), type "<f4">
        <HDF5 dataset "y": shape (256, 256), type "<f4">

Note

This function requires h5py to be installed.

open(filename)[source]#

Open a file as an h5py-like object.

Format supported: - h5 files, if h5py module is installed - SPEC files exposed as a NeXus layout - raster files exposed as a NeXus layout (if fabio is installed) - fio files exposed as a NeXus layout - Numpy files (‘npy’ and ‘npz’ files)

The filename can be trailled an HDF5 path using the separator ::. In this case the object returned is a proxy to the target node, implementing the close function and supporting with context.

The file is opened in read-only mode.

Parameters:

filename (str) – A filename which can containt an HDF5 path by using :: separator.

Raises:

IOError if the file can’t be loaded or path can’t be found

Return type:

h5py-like node

get_h5_class(obj=None, class_=None)[source]#

Returns the HDF5 type relative to the object or to the class.

Parameters:
  • obj – Instance of an object

  • class – A class

Return type:

H5Type

h5type_to_h5py_class(type_)[source]#

Returns an h5py class from an H5Type. None if nothing found.

Parameters:

type (H5Type)

Return type:

H5py class

get_h5py_class(obj)[source]#

Returns the h5py class from an object.

If it is an h5py object or an h5py-like object, an h5py class is returned. If the object is not an h5py-like object, None is returned.

Parameters:

obj – An object

Returns:

An h5py object

is_file(obj)[source]#

True is the object is an h5py.File-like object.

Parameters:

obj – An object

is_group(obj)[source]#

True if the object is a h5py.Group-like object. A file is a group.

Parameters:

obj – An object

is_dataset(obj)[source]#

True if the object is a h5py.Dataset-like object.

Parameters:

obj – An object

True if the object is a h5py.SoftLink-like object.

Parameters:

obj – An object

True if the object is a h5py.ExternalLink-like object.

Parameters:

obj – An object

True if the object is a h5py link-like object.

Parameters:

obj – An object

visitall(item)[source]#

Visit entity recursively including links.

It does not follow links. This is a generator yielding (relative path, object) for visited items.

Parameters:

item – The item to visit.

iter_groups(group, _root=None)[source]#

Pythonic implementation of h5py.Group visit()

match(group, path_pattern)[source]#

Generator of paths inside given h5py-like group matching path_pattern

Parameters:

path_pattern (str)

Return type:

Generator[str, None, None]

get_data(url)[source]#

Returns a numpy data from an URL.

Examples:

>>> # 1st frame from an EDF using silx.io.open
>>> data = silx.io.get_data("silx:/users/foo/image.edf::/scan_0/instrument/detector_0/data[0]")
>>> # 1st frame from an EDF using fabio
>>> data = silx.io.get_data("fabio:/users/foo/image.edf::[0]")

Yet 2 schemes are supported by the function.

  • If silx scheme is used, the file is opened using

    silx.io.open() and the data is reach using usually NeXus paths.

  • If fabio scheme is used, the file is opened using fabio.open()

    from the FabIO library. No data path have to be specified, but each frames can be accessed using the data slicing. This shortcut of silx.io.open() allow to have a faster access to the data.

Parameters:

url (Union[str, DataUrl]) – A data URL

Return type:

Union[numpy.ndarray, numpy.generic]

Raises:
  • ImportError – If the mandatory library to read the file is not available.

  • ValueError – If the URL is not valid or do not match the data

  • IOError – If the file is not found or in case of internal error of fabio.open() or silx.io.open(). In this last case more informations are displayed in debug mode.

rawfile_to_h5_external_dataset(bin_file, output_url, shape, dtype, overwrite=False)[source]#

Create a HDF5 dataset at output_url pointing to the given vol_file.

Either shape or info_file must be provided.

Parameters:
  • bin_file (str) – Path to the .vol file

  • output_url (DataUrl) – HDF5 URL where to save the external dataset

  • shape (tuple) – Shape of the volume

  • dtype (numpy.dtype) – Data type of the volume elements (default: float32)

  • overwrite (bool) – True to allow overwriting (default: False).

vol_to_h5_external_dataset(vol_file, output_url, info_file=None, vol_dtype=<class 'numpy.float32'>, overwrite=False)[source]#

Create a HDF5 dataset at output_url pointing to the given vol_file.

If the vol_file.info containing the shape is not on the same folder as the

vol-file then you should specify her location.

Parameters:
  • vol_file (str) – Path to the .vol file

  • output_url (DataUrl) – HDF5 URL where to save the external dataset

  • info_file (Optional[str]) – .vol.info file name written by pyhst and containing the shape information

  • vol_dtype (numpy.dtype) – Data type of the volume elements (default: float32)

  • overwrite (bool) – True to allow overwriting (default: False).

Raises:

ValueError – If fails to read shape from the .vol.info file

hdf5_to_python_type(value, decode_ascii, encoding)[source]#

Convert HDF5 type to proper python type.

Parameters:
  • value

  • decode_ascii (bool)

  • str (encoding)

h5py_decode_value(value, encoding='utf-8', errors='surrogateescape')[source]#

Keep bytes when value cannot be decoded

Parameters:
  • value – bytes or array of bytes

  • str (errors)

  • str

h5py_encode_value(value, encoding='utf-8', errors='surrogateescape')[source]#

Keep string when value cannot be encoding

Parameters:
  • value – string or array of strings

  • str (errors)

  • str

h5py_value_isinstance(value, vtype)[source]#

Keep string when value cannot be encoding

Parameters:
  • value – string or array of strings

  • vtype

Return bool:

class H5pyDatasetReadWrapper(dset, decode_ascii=False)[source]#

Wrapper to handle H5T_STRING decoding on-the-fly when reading a dataset. Uniform behaviour for h5py 2.x and h5py 3.x

h5py abuses H5T_STRING with ASCII character set to store bytes: dset[()] = b”…” Therefore an H5T_STRING with ASCII encoding is not decoded by default.

class H5pyAttributesReadWrapper(attrs, decode_ascii=False)[source]#

Wrapper to handle H5T_STRING decoding on-the-fly when reading an attribute. Uniform behaviour for h5py 2.x and h5py 3.x

h5py abuses H5T_STRING with ASCII character set to store bytes: dset[()] = b”…” Therefore an H5T_STRING with ASCII encoding is not decoded by default.

h5py_read_dataset(dset, index=(), decode_ascii=False)[source]#

Read data from dataset object. UTF-8 strings will be decoded while ASCII strings will only be decoded when decode_ascii=True.

Parameters:
  • dset (h5py.Dataset)

  • index – slicing (all by default)

  • decode_ascii (bool)

h5py_read_attribute(attrs, name, decode_ascii=False)[source]#

Read data from attributes. UTF-8 strings will be decoded while ASCII strings will only be decoded when decode_ascii=True.

Parameters:
  • attrs (h5py.AttributeManager)

  • name (str) – attribute name

  • decode_ascii (bool)

h5py_read_attributes(attrs, decode_ascii=False)[source]#

Read data from attributes. UTF-8 strings will be decoded while ASCII strings will only be decoded when decode_ascii=True.

Parameters:
  • attrs (h5py.AttributeManager)

  • decode_ascii (bool)