nabu.cuda.utils
source module nabu.cuda.utils
Functions
-
get_cuda_stream — Notes
In cupy, contexts/device/stream management is not so explicit. It seems that cupy wants to hide these things from user and handle it automatically. There are multiple issues: - cuda.get_current_stream() will always return device_id = -1 (even when called with device_id != -1) - A Device object does not have a list of attached streams - A Stream object does not have its device, we have keep various objects references
-
detect_cuda_gpus — Detect the available Nvidia CUDA GPUs on the current host.
-
collect_cuda_gpus — Return a dictionary of GPU ids and brief description of each CUDA-compatible GPU with a few fields.
-
to_int2 — Convert a 1D array of length 2 into a "int2" data type. Beware, the first coordinate is x ! (as opposed to usual python/numpy/C convention)
-
create_texture — Create a texture object.
-
copy_to_cuarray_with_offset — Copy a cupy.ndarray into a CUDA Array (matrix-like data structures that bakes textures).
source get_cuda_stream(device_id=None, force_create=False)
Notes
In cupy, contexts/device/stream management is not so explicit. It seems that cupy wants to hide these things from user and handle it automatically. There are multiple issues: - cuda.get_current_stream() will always return device_id = -1 (even when called with device_id != -1) - A Device object does not have a list of attached streams - A Stream object does not have its device, we have keep various objects references
Solution so far
- Call Device(id).use() at program startup
- Use cuda.runtime.getDevice()
- Use cuda.stream.Stream()
Detect the available Nvidia CUDA GPUs on the current host.
Returns
-
gpus : dict — Dictionary where the key is the GPU ID, and the value is a
cupy.cuda.Deviceobject. -
error_msg : str — In the case where there is an error, the message is returned in this item. Otherwise, it is a None object.
source collect_cuda_gpus(on_error='warn')
Return a dictionary of GPU ids and brief description of each CUDA-compatible GPU with a few fields.
Parameters
-
on_error : str, optional — What to do on error. Possible values: - "warn": print a warning - "raise": throw an error
Raises
-
RuntimeError
source dtype_to_ctype(dtype)
Convert a 1D array of length 2 into a "int2" data type. Beware, the first coordinate is x ! (as opposed to usual python/numpy/C convention)
source create_texture(shape, dtype, from_image=None, address_mode='border', filter_mode='linear', normalized_coords=False)
Create a texture object.
Parameters
-
shape : tuple of int — Image shape
-
dtype : numpy.dtype — Data type
-
from_image : numpy.ndarray, optional — Image to be used as a texture. If provided, above parameters are replaced with image properties.
-
address_mode : str, optional — Which address mode to use. Can be: - "border" (default): extension with zeros - "clamp": extension with edges - "mirror": extension with mirroring - "wrap": periodic extension
-
filter_mode : str, optional — Which filter mode to use when accessing texture at non-integer coordinates. Can be "linear" or "nearest" Default is (bi)linear filtering.
-
normalized_coords : bool, optional — Whether to use normalized coordinates. Default is False.
Returns
-
tex_obj : cupy.cuda.texture.TextureObject — Texture object that can be passed to kernels (accessed in source code with 'cudaTextureObject_t' type)
-
cu_arr : cupy.cuda.texture.CUDAarray — Cuda array that bakes the texture
Notes
- Two-dimensional images default to single-chanel textures.
- It will allocate memory for the underlying "cuda array" object
Raises
-
NotImplementedError
source cupy_array_from_ptr(ptr, shape, dtype, owner_reference)
source copy_to_cuarray_with_offset(dst_cuarray, src_array, offset_x=0, offset_y=0)
Copy a cupy.ndarray into a CUDA Array (matrix-like data structures that bakes textures).
Copies a matrix ('height' rows of 'width' bytes each) from the memory area pointed to by 'src' to the CUDA array 'dst' starting at the upper left corner (wOffset, hOffset) [...] 'spitch' is the width in memory in bytes of the 2D array pointed to by 'src', including any padding added to the end of each row. - 'wOffset' + 'width' must not exceed the width of the CUDA array 'dst'. - 'width' must not exceed 'spitch'.
NB: for copies without offset, we can just use dst_cuarray.copy_from(src_cupyarray)