Optimization methods for learning dictionaries.

Standard algorithms

Algorithms for training on complete data (ie. when you don’t need to mask your data). These are the algorithms needed for most use cases.

dictlearn.optimize.ksvd(signals, dictionary, iters, n_nonzero=0, omp_tol=0, tol=0, verbose=False, n_threads=1, retcodes=False)

Iterative batch algorithm [1, 2] for fitting a dictionary D to a set of signals X. Each iteration consists of two stages:

  1. Fix D. Find sparse codes A such that X is approx equal DA
  2. Fix the sparse codes. Find D_new such that norm(X - D_new*A) < norm(X - DA)

Need one of (or both) n_nonzero and omp_tol different from zero. If n_nonzero > 0 and omp_tol == 0 then KSVD finds an approximate solution to:

\[\underset{\mathbf{D, A}}{min}\quad \left \| \mathbf{X - DA} \right \|_F^2 \text{ such that } \left \| \mathbf{A} \right \|_0 \leq n\_nonzero\]

If omp_tol is not None then KSVD finds an approximate solution to:

\[\underset{\mathbf{D, A}}{argmin}\quad\left \| \mathbf{A} \right \|_0 \text{ such that } \left \| \mathbf{X - DA} \right \|_F^2 \leq omp\_tol\]
>>> import dictlearn as dl
>>> from numpy import linalg as LA
>>> image = dl.Patches(dl.imread('some-image'), 8).patches
>>> dictionary = dl.random_dictionary(8*8, 128)
>>> sparse_1 = dl.omp_batch(image, dictionary, 10)
>>> new_dict, _ = dl.ksvd(image, dictionary, 20, 10)
>>> err_initial = LA.norm(image - dictionary.dot(sparse_1))
>>> sparse_2 = dl.omp_batch(image, new_dict, 10)
>>> err_trained = LA.norm(image - new_dict.dot(sparse_2))
>>> assert err_trained < err_initial
  • signals – Training signals. One signal per column numpy.ndarray of shape (signal_size, n_signals)
  • dictionary – Initial dictionary, shape (signal_size, n_atoms)
  • iters – Max number of iterations
  • n_nonzero – Default 0. Max nonzero coefficients in sparse decomposition
  • omp_tol – Default 0. Tolerance of sparse approximation. Overrides n_nonzero
  • tol – Default 0. Stop learning if norm(signals - dictionary.dot(sparse_codes) < tol
  • verbose – Print progress
  • n_threads – Default 1. Number of threads to use for sparse coding.
  • retcodes – Return sparse codes from last iteration

dictionary[, sparse decomposition if retcodes = True]

dictlearn.optimize.odl(signals, dictionary, iters=1000, n_nonzero=10, tol=0, verbose=False, batch_size=1, n_threads=1, seed=None)

Online dictionary learning algorithm

This algorithm sparsely encode one training signal at the time and updates the dictionary given this signal. The number if iterations also determines how many of the training signals are used. If the number of iterations is less than the number of signals, then iters signals is drawn at random. If iters is equal to the number of signals all signals are used in a random order.

The dictionary atoms are updated using block-coordinate descent. See [4] for details

  • signals – Training signals. One signal per column numpy.ndarray of shape (signal_size, n_signals)
  • dictionary – Initial dictionary, shape (signal_size, n_atoms)
  • iters – Default 1000. Number of training iterations to use. This is also equal to the number of signals used in training.
  • n_nonzero – Default 10. Max nonzero coefficients in sparse decomposition
  • tol – Default 0. Tolerance of sparse approximation. Overrides n_nonzero
  • verbose – Print progress
  • batch_size – The number of signals to use for each dictionary update
  • seed – Seed the drawing of random signals

Trained and improved dictionary

dictlearn.optimize.mod(signals, dictionary, n_nonzero, iters, n_threads=1)

Method of optimal directions

The first alternate minimization algorithm [3] for dictionary learning.

  1. Find sparse codes A given signals X and dictionary D
  2. Update D given the new A by approximately solving for D in X = DA. That is D = X*pinv(A), with pinv(A) = A.T*(A*A.T)^-1
  • signals – Training signals
  • dictionary – Initial dictionary
  • n_nonzero – Sparsity target for signal approximation
  • iters – Number of dictionary update iterations
  • n_threads – Default 1. Number of threads to use for sparse coding step

New dictionary

Masked data

Use these algorithms when you need to explicitly mark which data points to use and which to discard/ignore. All masks should have the same shape as the training data, with values [0, 1]. A data point is ignored if 0.

dictlearn.optimize.itkrmm(signals, masks, dictionary, n_nonzero, iters, low_rank=None, verbose=False)

Train a dictionary from corrupted image patches.

Need signals and masks of same shape. Data point signals[i, j] is used if the corresponding point in the mask, masks[i, j] == True. All points signals[i, j] with masks[i, j] == False are ignored.

See [5] for details.

  • signals – Corrupted image patches, shape (patch_size, n_patches)
  • masks – Binary mask for signals, same shape as signal.
  • dictionary – Initial dictionary (patch_size, n_atoms)
  • n_nonzero – Number of nonzero coeffs to use for training
  • iters – Max number of iterations
  • low_rank – Matrix of low rank components, shape (patch_size, n_low_rank)
  • verbose – Print progress

Dictionary. Shape (patch_size, n_atoms + n_low_rank)

dictlearn.optimize.reconstruct_low_rank(signals, masks, n_low_rank, initial=None, iters=10)

Reconstruct low rank components from image patches, by ITKrMM.

Low rank components or atoms capture low rank signal features. In the case where signals are image patches low rank atoms can capture average intensities and low variance features in the image. When these are included in a dictionary most of the signals will use at least one of the low rank atoms leaving the normal atoms to represent more specific image features

  • signals – Image patches
  • masks – Masks for image patches
  • n_low_rank – Number of low rank components to reconstruct
  • initial – Initial low rank dictionary, shape (signals.shape[0], n_low_rank)
  • iters – Number of iterations for each component

Low rank dictionary, shape (signals.shape[0], n_low_rank)


[1] M. Aharon, M. Elad, and A. Bruckstein, “The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Trans. on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.

[2] Rubinstein, Ron, Michael Zibulevsky, and Michael Elad. “Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit.” Cs Technion 40.8 (2008): 1-15.

[3] Engan, Kjersti, Sven Ole Aase, and J. Hakon Husoy. “Method of optimal directions for frame design.” Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on. Vol. 5. IEEE, 1999.

[4] Mairal, Julien, et al. “Online dictionary learning for sparse coding.” Proceedings of the 26th annual international conference on machine learning. ACM, 2009.

[5] Naumova, Valeriya, and Karin Schnass. “Dictionary learning from incomplete data.” arXiv preprint arXiv:1701.03655 (2017).