Preprocessing

class dictlearn.preprocess.Patches(image, size, stride=1, max_patches=None, random=None, order='C')
REMOVE_MEAN = 'remove_mean'

Generate and reconstruct image patches

Parameters:
  • image – ndarray, 2D or 3D
  • size – Patch size, since all patches are square (cube) this is just the size of the first dimension. Ie 8 for (8, 8) patches
  • stride – Stride/distance between patches in image. Can be int or list type. If int then the stride is the same in every dimension. If list then each stride[i] denotes the stride on axis i. Patches cannot be reconstructed if the stride in one dimension is larger the than the patch size in the same dimension. Ie. stride[i] > size[i] for any i
  • max_patches – Maximum number of patches
  • random – True for taking patches from random locations in image. Overwritten if max_patches=None
  • order – C or F for C or FORTRAN order on underlying data
check_batch_size_or_raise(batch_size)

Check if there’s enough memory to store ‘batch_size’ patches. Raise MemoryError if not

generator(batch_size, callback=False)

Create and reconstruct a batch iteratively.

If Patches.patches is too large to keep all in memory use this. Only ‘batch_size’ patches are generated. This requires approximately ‘batch_size’ times less memory. If batch_size is 100 and Patches.patches need 100 memories then this need only one memory.

>>> import numpy as np
>>> volume = np.load('some_image.npy')
>>> size, stride = [10, 10, 10], [1, 1, 1]
>>> patches = Patches(volume, size, stride)
>>> for batch in patches.generator(100):
>>>     # Handle batch
>>>     assert batch.shape[1] == 100
>>>     assert batch.shape[0] == 1000, 'Can fail at last batch, see stride'

One matrix of size (patch_size, batch_size) is created per iteration. This generator return (batch, callback) with batch a numpy array of shape (patch_size, batch_size) and callback(batch) reconstruct the part of the volume which contains the given batch. It is required that the argument to callback has shape identical to the batch returned

This can be used if Patches3D.create() requires too much memory. The amount of memory required by this method is batch_size*size[0]*size[1]*size[2]*volume.dtype.itemsize bytes

>>> import numpy as np
>>> volume = np.load('some_image_volume.npy')
>>> size, stride = [10, 10, 10], [1, 1, 1]
>>> patches = Patches(volume, size, stride)
>>> for batch, reconstruct in patches.generator(100, callback=True):
>>>    # Handle batch, here we do nothing
>>>    reconstruct(batch)
>>> assert np.array_equal(volume, patches.reconstructed)
Parameters:
  • batch_size – Size of batches. The last batch can be smaller if n_patches % batch_size != 0
  • callback – If True a callback function ‘callback(batch)’ is returned such the the image can be partially reconstructed
Returns:

Generator

n_patches
Returns:Number of patches
patches
Returns:Image patches, shape (size[0]*size[1]*…, n_patches)
reconstruct(new_patches, save=False)

Reconstruct the image with new_patches. Overlapping regions are averaged. The reconstructed patches are not saved by default

self.patches are the same object before and after this method is called, as long as save=False

Parameters:
  • new_patchesndarray (patch_size, n_patches). Patches returned from Patches.patches
  • save – Overwrite current patches with new_patches
Returns:

Reconstructed image

remove_mean(add_back=True)

Remove the mean from every patch, this is automatically added back if the image is reconstructed

Parameters:add_back – Automatically add back the mean to patches on reconstruction
shape

Shape of patch matrix, (patch_size, n_patches)

size

Size of patches

class dictlearn.preprocess.Patches3D(volume, size, stride)

Create and reconstruct image patches from 3D volume.

Parameters:
  • volume – 3D ndarray
  • size – Size of image patches, (x, y, z)
  • stride – Stride between each patch, (i, j, k). ‘volume’ cannot be reconstructed if i > x, j > y or k > z
create_batch_and_reconstruct(batch_size)

Create and reconstruct a batch iteratively.

One matrix of ‘batch_size’ is created per iteration. This generator return (batch, callback) with batch a numpy array of shape (n, batch_size) and callback(batch) reconstruct the part of the volume which contains the given batch.

This can be used if Patches3D.create() requires too much memory. The amount of memory required by this method is batch_size*size[0]*size[1]*size[2]*volume.dtype.itemsize bytes

>>> import numpy as np
>>> import dictlearn as dl
>>> dictionary = np.load('some_dictionary.npy')
>>> volume = np.load('some_image_volume.npy')
>>> size, stride = [1, 1, 1], [1, 1, 1]
>>> patches = Patches3D(volume, size, stride)
>>> for batch, reconstruct in patches.create_batch_and_reconstruct(100):
>>>    new_batch = dl.omp_batch(batch, dictionary)
>>>    reconstruct(new_batch)
>>> reconstructed_volume = patches.reconstructed
Parameters:batch_size – Number of patches per batch.
Returns:Generator, next() returns (batch, reconstruct(new_batch)
next_batch(batch_size)
Parameters:batch_size – Number of image patches per batch
Returns:Generator, next() returns a ndarray of shape (n, batch_size)
dictlearn.preprocess.center(data, dim=0, retmean=False, inplace=False)

Remove the mean at dim from every patch

Parameters:
  • data – ndarray, data to center
  • dim – Dimension to calculate mean, default 0 (columns)
  • retmean – Return mean if True
  • inplace – Change argument data directly if True, returns mean only
Returns:

Centered patches and mean if retmean is True. Or just mean if inplace is True

dictlearn.preprocess.normalize(patches, lim=0.2)

L2 normalization. If l2 norm of a patch is smaller than lim the the patch is divided element wise by lim

Parameters:
  • patches – ndarray, (size, n_patches)
  • lim – Threshold for low intensity patches
Returns: