Usage

hdf5plugin allows using additional HDF5 compression filters with h5py for reading and writing compressed datasets.

Read compressed datasets

In order to read compressed dataset with h5py, use:

import hdf5plugin

It registers hdf5plugin supported compression filters with the HDF5 library used by h5py. Hence, HDF5 compressed datasets can be read as any other dataset (see h5py documentation).

Note

HDF5 datasets compressed with Blosc2 can require additional plugins to enable decompression, such as blosc2-grok or blosc2-openhtj2k. See list of Blosc2 filters and codecs.

Write compressed datasets

As for reading compressed datasets, import hdf5plugin is required to enable the supported compression filters.

To create a compressed dataset use h5py.Group.create_dataset and set the compression and compression_opts arguments.

hdf5plugin provides helpers to prepare those compression options: Bitshuffle, Blosc, Blosc2, BZip2, FciDecomp, LZ4, Sperr, SZ, SZ3, Zfp, Zstd.

Sample code:

import numpy
import h5py
import hdf5plugin

# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), compression=hdf5plugin.LZ4())
f.close()

# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()

Relevant h5py documentation: Filter pipeline and Chunked Storage.

Bitshuffle

class hdf5plugin.Bitshuffle(nelems=0, cname=None, clevel=3, lz4=None)

h5py.Group.create_dataset’s compression arguments for using bitshuffle filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'bitshuffle_with_lz4',
    data=numpy.arange(100),
    compression=hdf5plugin.Bitshuffle(nelems=0, lz4=True))
f.close()
Parameters:
  • nelems (int) – The number of elements per block. It needs to be divisible by eight. Default: 0 (for about 8 kilobytes per block).

  • cname (Optional[Literal['none', 'lz4', 'zstd']]) – Compressor name.

  • clevel (int) – Compression level, used only for “zstd” compression. Must be between 0 (default level) and 22 (maximum compression). Default: 3.

filter_name = 'bshuf'
filter_id = 32008
property nelems: int

Number of elements per block

property cname: Literal['none', 'lz4', 'zstd']

Compressor name

property clevel: int | None

Compression level, only for zstd compressor, None for others

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

Blosc

class hdf5plugin.Blosc(cname='lz4', clevel=5, shuffle=1)

h5py.Group.create_dataset’s compression arguments for using blosc filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'blosc_byte_shuffle_blosclz',
    data=numpy.arange(100),
    compression=hdf5plugin.Blosc(cname='blosclz', clevel=9, shuffle=hdf5plugin.Blosc.SHUFFLE))
f.close()
Parameters:
  • cname (Literal['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']) – Compressor name. snappy availability depends on compilation (requires C++11).

  • clevel (int) – Compression level from 0 (no compression) to 9 (maximum compression). Default: 5.

  • shuffle (int) –

    One of:

    • Blosc.NOSHUFFLE (0): No shuffle

    • Blosc.SHUFFLE (1): byte-wise shuffle (default)

    • Blosc.BITSHUFFLE (2): bit-wise shuffle

NOSHUFFLE = 0

Flag to disable data shuffle pre-compression filter

SHUFFLE = 1

Flag to enable byte-wise shuffle pre-compression filter

BITSHUFFLE = 2

Flag to enable bit-wise shuffle pre-compression filter

filter_name = 'blosc'
filter_id = 32001
property cname: Literal['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']

Compressor name

property clevel: int

Compression level from 0 (no compression) to 9 (maximum compression)

property shuffle: int

Shuffle mode one of: NOSHUFFLE, SHUFFLE, BITSHUFFLE

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

Blosc2

class hdf5plugin.Blosc2(cname='blosclz', clevel=5, filters=1)

h5py.Group.create_dataset’s compression arguments for using blosc2 filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'blosc2_byte_shuffle_blosclz',
    data=numpy.arange(100),
    compression=hdf5plugin.Blosc2(cname='blosclz', clevel=9, filters=hdf5plugin.Blosc2.SHUFFLE))
f.close()
Parameters:
  • cname (Literal['blosclz', 'lz4', 'lz4hc', 'zlib', 'zstd']) – Compressor name.

  • clevel (int) – Compression level from 0 (no compression) to 9 (maximum compression). Default: 5.

  • filters (int) –

    One of:

    • Blosc2.NOFILTER (0): No pre-compression filter

    • Blosc2.SHUFFLE (1): Byte-wise shuffle (default)

    • Blosc2.BITSHUFFLE (2): Bit-wise shuffle

    • Blosc2.DELTA (3): Stores diff’ed blocks

    • Blosc2.TRUNC_PREC (4): Zeroes the least significant bits of the mantissa

NOFILTER = 0

Flag to disable pre-compression filter

SHUFFLE = 1

Flag to enable byte-wise shuffle pre-compression filter

BITSHUFFLE = 2

Flag to enable bit-wise shuffle pre-compression filter

DELTA = 3

Flag to store blocks inside a chunk diff’ed with respect to first block in the chunk

TRUNC_PREC = 4

Flag to zeroes the least significant bits of the mantissa of float32 and float64 types

filter_id = 32026
filter_name = 'blosc2'
property cname: Literal['blosclz', 'lz4', 'lz4hc', 'zlib', 'zstd']

Compressor name

property clevel: int

Compression level from 0 (no compression) to 9 (maximum compression)

property filters: int

Pre-compression filter, one of: NOFILTER, SHUFFLE, BITSHUFFLE, DELTA, TRUNC_PREC

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

BZip2

class hdf5plugin.BZip2(blocksize=9)

h5py.Group.create_dataset’s compression arguments for using BZip2 filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'bzip2',
    data=numpy.arange(100),
    compression=hdf5plugin.BZip2(blocksize=5))
f.close()
Parameters:

blocksize (int) – Size of the blocks as a multiple of 100k

filter_name: str = 'bzip2'
filter_id: int = 307
property blocksize: int

Size of the blocks as a multiple of 100k in [1, 9]

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

FciDecomp

class hdf5plugin.FciDecomp

h5py.Group.create_dataset’s compression arguments for using FciDecomp filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'fcidecomp',
    data=numpy.arange(100),
    compression=hdf5plugin.FciDecomp())
f.close()
filter_name: str = 'fcidecomp'
filter_id: int = 32018
get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

LZ4

class hdf5plugin.LZ4(nbytes=0)

h5py.Group.create_dataset’s compression arguments for using lz4 filter.

f = h5py.File('test.h5', 'w')
f.create_dataset('lz4', data=numpy.arange(100),
    compression=hdf5plugin.LZ4(nbytes=0))
f.close()
Parameters:

nbytes (int) – The number of bytes per block. It needs to be in the range of 0 < nbytes < 2113929216 (1,9GB). Default: 0 (for 1GB per block).

filter_name: str = 'lz4'
filter_id: int = 32004
property nbytes: int

The number of bytes per block.

If 0, block size is 1GB.

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

Sperr

class hdf5plugin.Sperr(rate=None, peak_signal_to_noise_ratio=None, absolute=None, swap=False, missing_value_mode=0)

h5py.Group.create_dataset’s compression arguments for using SPERR filter.

It can be passed as keyword arguments:

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'sperr',
    data=numpy.random.random(1000).reshape(100, 10),
    **hdf5plugin.Sperr(rate=16))
f.close()

This filter provides 3 modes:

  • Fixed bit-per-pixel with the rate argument: The quality argument provides the target bitrate (range: 0.0 < rate < 64.0)

    f.create_dataset(
        'sperr_fixed_bit-per-pixel',
        data=numpy.random.random(1000).reshape(100, 10),
        **hdf5plugin.Sperr(rate=10))
    
  • Fixed peak signal-to-noise ratio (PSNR) with the peak_signal_to_noise_ratio argument: The quality argument provides the target PSNR (range: 0.0 < peak_signal_to_noise_ratio)

    f.create_dataset(
        'sperr_fixed_peak_signal-to-noise_ratio',
        data=numpy.random.random(1000).reshape(100, 10),
        **hdf5plugin.Sperr(peak_signal_to_noise_ratio=1e-6))
    
  • Fixed point-wise error (PWE) with the absolute argument: The quality argument provides the PWE tolerance (range: 0.0 < absolute)

    f.create_dataset(
        'sperr_fixed_point-wise_error',
        data=numpy.random.random(1000).reshape(100, 10),
        **hdf5plugin.Sperr(absolute=1e-4))
    

If the swap argument is True (False by default) a “rank order swap” pre-filtering is performed.

The missing_value_mode argument indicates which value is used to indicate missing data.

For more details, see H5Z-SPERR.

filter_name: str = 'sperr'
filter_id: int = 32028
NO_MISSING = 0

No missing value.

MISSING_NAN = 1

Any NAN is a missing value.

MISSING_1E35 = 2

Any value where abs(value) >= 1e35 is a missing value.

The first occurance of a value with a magnitude larger than 1e35 will be used to fill in all missing value locations.

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

SZ

class hdf5plugin.SZ(absolute=None, relative=None, pointwise_relative=None)

h5py.Group.create_dataset’s compression arguments for using SZ2 filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'sz',
    data=numpy.random.random(100),
    compression=hdf5plugin.SZ())
f.close()

This filter provides different modes:

  • Absolute mode: To use, set the absolute argument. It ensures that the resulting values will be within the provided absolute tolerance.

    f.create_dataset(
        'sz_absolute',
        data=numpy.random.random(100),
        compression=hdf5plugin.SZ(absolute=0.1))
    
  • Relative mode: To use, set the relative argument. It ensures that the resulting values will be within the provided relative tolerance. The tolerance will be computed by multiplying the provided argument by the range of the data values.

    f.create_dataset(
        'sz_relative',
        data=numpy.random.random(100),
        compression=hdf5plugin.SZ(relative=0.01))
    
  • Point-wise relative mode: To use, set the pointwise_relative argument. It ensures that each grid point of the resulting values will be within the provided relative tolerance.

    f.create_dataset(
        'sz_pointwise_relative',
        data=numpy.random.random(100),
        compression=hdf5plugin.SZ(pointwise_relative=0.01))
    

For more details about the compressor, see SZ2 compressor.

Warning

The SZ2 compressor is deprecated, see SZ repository

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

filter_name: str = 'sz'
filter_id: int = 32017

SZ3

class hdf5plugin.SZ3(absolute=None, relative=None, norm2=None, peak_signal_to_noise_ratio=None)

h5py.Group.create_dataset’s compression arguments for using SZ3 filter.

  • Absolute mode: To use, set the absolute argument. It ensures that the resulting values will be within the provided absolute tolerance.

    f.create_dataset(
        'sz3_absolute',
        data=numpy.random.random(100),
        compression=hdf5plugin.SZ3(absolute=0.1))
    

For more details about the compressor, see SZ3 compressor.

Warning

Backward compatibility is currently not guaranteed: See this discussion.

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

filter_name: str = 'sz3'
filter_id: int = 32024

Zfp

class hdf5plugin.Zfp(rate=None, precision=None, accuracy=None, reversible=False, minbits=None, maxbits=None, maxprec=None, minexp=None)

h5py.Group.create_dataset’s compression arguments for using ZFP filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'zfp',
    data=numpy.random.random(100),
    compression=hdf5plugin.Zfp())
f.close()

This filter provides different modes:

  • Fixed-rate mode: To use, set the rate argument. For details, see zfp fixed-rate mode.

    f.create_dataset(
        'zfp_fixed_rate',
        data=numpy.random.random(100),
        compression=hdf5plugin.Zfp(rate=10.0))
    
  • Fixed-precision mode: To use, set the precision argument. For details, see zfp fixed-precision mode.

    f.create_dataset(
        'zfp_fixed_precision',
        data=numpy.random.random(100),
        compression=hdf5plugin.Zfp(precision=10))
    
  • Fixed-accuracy mode: To use, set the accuracy argument For details, see zfp fixed-accuracy mode.

    f.create_dataset(
        'zfp_fixed_accuracy',
        data=numpy.random.random(100),
        compression=hdf5plugin.Zfp(accuracy=0.001))
    
  • Reversible (i.e., lossless) mode: To use, set the reversible argument to True For details, see zfp reversible mode.

    f.create_dataset(
        'zfp_reversible',
        data=numpy.random.random(100),
        compression=hdf5plugin.Zfp(reversible=True))
    
  • Expert mode: To use, set the minbits, maxbits, maxprec and minexp arguments. For details, see zfp expert mode.

    f.create_dataset(
        'zfp_expert',
        data=numpy.random.random(100),
        compression=hdf5plugin.Zfp(minbits=1, maxbits=16657, maxprec=64, minexp=-1074))
    
Parameters:
  • rate (float) – Use fixed-rate mode and set the number of compressed bits per value.

  • precision (float) – Use fixed-precision mode and set the number of uncompressed bits per value.

  • accuracy (float) – Use fixed-accuracy mode and set the absolute error tolerance.

  • reversible (bool | None) – If True, it uses the reversible (i.e., lossless) mode.

  • minbits (int) – Minimum number of compressed bits used to represent a block.

  • maxbits (int) – Maximum number of bits used to represent a block.

  • maxprec (int) – Maximum number of bit planes encoded. It controls the relative error.

  • minexp (int) – Smallest absolute bit plane number encoded. It controls the absolute error.

filter_name: str = 'zfp'
filter_id: int = 32013
get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

Zstd

class hdf5plugin.Zstd(clevel=3)

h5py.Group.create_dataset’s compression arguments for using Zstd filter.

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'zstd',
    data=numpy.arange(100),
    compression=hdf5plugin.Zstd(clevel=22))
f.close()
Parameters:

clevel (int) – Compression level from -131072 (lowest compression) to 22 (maximum compression). Negative compression levels offer faster compression and decompression speed at the cost of compression ratio. Compression levels from 20 to 22 offer better compression ratio at the expense of requiring more memory. Default: 3.

get_config()

Returns filter configuration

Return type:

dict[str, int | float | bool | str]

filter_name: str = 'zstd'
filter_id: int = 32015
property clevel: int

Compression level from -131072 (lowest compression) to 22 (maximum compression)

Get information about hdf5plugin

Constants:

hdf5plugin.PLUGIN_PATH

Directory where the provided HDF5 filter plugins are stored.

Functions:

hdf5plugin.get_filters(filters=('bshuf', 'blosc', 'blosc2', 'bzip2', 'fcidecomp', 'lz4', 'sperr', 'sz', 'sz3', 'zfp', 'zstd'))

Returns selected filter classes.

By default it returns all filter classes.

Parameters:

filters (int | str | tuple[int | str, ...]) – Filter name or ID or sequence of filter names or IDs (default: all filters). It also supports the value “registered” which selects currently available filters.

Return type:

tuple[type[FilterRefBase], ...]

Returns:

Tuple of filter classes

hdf5plugin.get_config()

Provides information about build configuration and filters registered by hdf5plugin.

Return type:

HDF5PluginConfig

Manage registered filters

When imported, hdf5plugin initialises and registers the filters it embeds if there is no already registered filters for the corresponding filter IDs.

h5py gives access to HDF5 functions handling registered filters in h5py.h5z. This module allows checking the filter availability and registering/unregistering filters.

hdf5plugin provides an extra register function to register the filters it provides, e.g., to override an already loaded filters. Registering with this function is required to perform additional initialisation and enable writing compressed data with the given filter.

hdf5plugin.register(filters=('bshuf', 'blosc', 'blosc2', 'bzip2', 'fcidecomp', 'lz4', 'sperr', 'sz', 'sz3', 'zfp', 'zstd'), force=True)

Initialise and register hdf5plugin embedded filters given their names or IDs.

Parameters:
  • filters (int | str | tuple[int | str, ...]) – Filter name or ID or sequence of filter names or IDs.

  • force (bool) – True to register the filter even if a corresponding one if already available. False to skip already available filters.

Return type:

bool

Returns:

True if all filters were registered successfully, False otherwise.

Get dataset compression

For built-in compression filters (i.e., GZIP, LZF, SZIP), dataset compression configuration can be retrieved with h5py.Dataset’s compression and compression_opts properties.

For third-party compression filters such as the one supported by hdf5plugin, the dataset compression configuration is stored in HDF5 filter pipeline. This filter pipeline configuration can be retrieved with h5py.Dataset “low level” API. For a given h5py.Dataset, dataset:

create_plist = dataset.id.get_create_plist()

for index in range(create_plist.get_nfilters()):
    filter_id, _, filter_options, _ = create_plist.get_filter(index)
    print(filter_id, filter_options)

For compression filters supported by hdf5plugin, hdf5plugin.from_filter_options() instantiates the filter configuration from the filter id and options.

hdf5plugin.from_filter_options(filter_id, filter_options)

Returns corresponding compression filter configuration instance.

create_plist = dataset.id.get_create_plist()

compression_filters = []

for index in range(create_plist.get_nfilters()):
    filter_id, _, filter_options, _ = create_plist.get_filter(index)
    if filter_id in hdf5plugin.FILTERS.values():
        compression_filters.append(hdf5plugin.from_filter_options(filter_id, filter_options))
Parameters:
  • filter_id (int | str) – HDF5 compression filter ID

  • filter_options (tuple[int, ...]) – Compression filter configuration as stored in HDF5 datasets

Raises:
  • ValueError – Unsupported or invalid filter_id, filter_options combination

  • NotImplementedError – Given filter or version of the filter is not supported

Return type:

FilterRefBase

Use HDF5 filters in other applications

Non h5py or non-Python users can also benefit from the supplied HDF5 compression filters for reading compressed datasets by setting the HDF5_PLUGIN_PATH environment variable the value of hdf5plugin.PLUGIN_PATH, which can be retrieved from the command line with:

python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)"

For instance:

export HDF5_PLUGIN_PATH=$(python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)")

should allow MatLab or IDL users to read data compressed using the supported plugins.

Setting the HDF5_PLUGIN_PATH environment variable allows already existing programs or Python code to read compressed data without any modification.