pynx.processing_unit: detecting, initializing and using computing or graphical processing units

exception pynx.processing_unit.ProcessingUnitException
exception pynx.processing_unit.ProcessingUnitWarning
pynx.processing_unit.opencl_device.available_gpu_speed(cl_platform=None, fft_shape=(16, 256, 256), axes=(-1, -2), min_gpu_mem=None, verbose=False, gpu_name=None, only_gpu=True, return_dict=False, ranking='fft')

Get a list of all available GPUs, sorted by FFT speed (Gflop/s) or bandwidth (Gbytes/s).

Parameters:
  • cl_platform – the OpenCL platform (default=None, all platform are tested)

  • fft_shape – the FFT shape against which the fft speed is calculated. If None, no benchmark is performed, the speed for all devices is reported as 0.

  • axes – the fft axis

  • min_gpu_mem – the minimum amount of gpu memory desired (bytes). Devices with less are ignored.

  • verbose – if True, printout FFT speed and memory for found GPUs

  • gpu_name – if given, only GPU whose name include this sub-string will be tested & reported. This can also be a list of acceptable strings

  • only_gpu – if True (the default), will skip non-GPU OpenCL devices

  • return_dict – if True, a dictionary will be returned instead of a list, with both timing and gflops listed

  • ranking – either ‘fft’ or ‘bandwidth’.

Returns:

a list of tuples (GPU device, speed (Gflop/s)), ordered by decreasing speed. If return_dict is True, a dictionary is returned with each entry is a dictionary with gflops and dt results

pynx.processing_unit.opencl_device.cl_device_fft_speed(d=None, fft_shape=(16, 256, 256), axes=(-1, -2), verbose=False, nb_test=4, nb_cycle=1, timing=False, shuffle_axes=False)

Compute the FFT calculation speed for a given OpenCL device.

Parameters:
  • d – the pyopencl.Device. If not supplied, pyopencl.create_some_context() will be called, and a device can be chosen interactively. This will result in a new context created for each call, and is not efficient (the context memory cannot be freed).

  • fft_shape – (nz,ny,nx) the shape of the complex fft transform, treated as a stack of nz 2D transforms of size nx * ny, or as a single 3D FFT, depending on the value of ‘axes’

  • axes – (1,2) the axes for the FFT. Default value is (-1,-2), which will perform a stacked 2d fft. Using None will perform a 3d fft.

  • verbose – if True, print the speed and timing for the given transform

  • nb_test – number of time the calculations will be repeated, the best result is returned

  • nb_cycle – each test consist of nb_cycle forward and backward FFT.

  • timing – if True, also return the time needed for a single FFT (dt)

  • shuffle_axes – if True, the order of axes for the transform will be shuffled to find the fastest combination, and the optimal axes order will be returned. Only useful for gpyfft, ignored when pyvkfft is used.

Returns:

The computed speed in Gflop/s (if timing is False) or a tuple (flops, dt), and also with the axes if shuffle_axes is True.

pynx.processing_unit.opencl_device.cl_device_global_mem_bandwidth(d)

Get the CUDA device global memory bandwidth :param d: the opencl device. :return: the memory bandwidth in Gbytes/s

pynx.processing_unit.cuda_device.available_gpu_speed(fft_shape=(16, 256, 256), batch=True, min_gpu_mem=None, verbose=False, gpu_name=None, return_dict=False, ranking='fft')

Get a list of all available GPUs, sorted by FFT speed (Gflop/s) or memory bandwidth (Gbytes/s).

Parameters:
  • fft_shape – the FFT shape against which the fft speed is calculated

  • batch – if True, perform a batch 2D FFT rather than a 3D one

  • min_gpu_mem – the minimum amount of gpu memory desired (bytes). Devices with less are ignored.

  • verbose – if True, printout speed and memory for found GPUs

  • gpu_name – if given, only GPU whose name include this sub-string will be tested & reported. This can also be a list of acceptable strings

  • return_dict – if True, a dictionary will be returned instead of a list, with both timing and gflops listed

  • ranking – either ‘fft’ or ‘bandwidth’.

Returns:

a list of tuples (GPU device, speed (Gflop/s) or memory bandwidth), ordered by decreasing values. If return_dict is True, a dictionary is returned with each entry is a dictionary with gflops and dt results

pynx.processing_unit.cuda_device.cuda_device_fft_speed(d=None, fft_shape=(16, 256, 256), batch=True, verbose=False, nb_test=4, nb_cycle=1, timing=False)

Compute the FFT calculation speed for a given CUDA device.

Parameters:
  • d – the pycuda.driver.Device. If not given, the default context will be used.

  • fft_shape=(nz,ny,nx) – the shape of the complex fft transform, treated as a stack of nz 2D transforms of size nx * ny, or as a single 3D FFT, depending on the value of ‘axes’

  • batch – if True, will perform a batch 2D FFT. Otherwise, will perform a 3D FFT.

  • verbose – if True, print the speed and timing for the given transform

  • nb_test – number of time the calculations will be repeated, the best result is returned

  • timing – if True, also return the time needed for a single FFT (dt)

Returns:

The computed speed in Gflop/s (if timing is False) or a tuple (flops, dt)

pynx.processing_unit.cuda_device.cuda_device_global_mem_bandwidth(d, measured=False)

Get the CUDA device global memory bandwidth :param d: the CUDA device. :param measured: if True, measure the bandwidth :return: the memory bandwidth in Gbytes/s