WIP: Bayes network using cooled Gibbs parameter estimation
An abstract class with shared code for Clustering Models
Basic DNN class.
Factorization Machine Model.
Factorization Machine Model. This class computes a factorization machine model a la
Steffen Rendle (2012): Factorization Machines with libFM, in ACM Trans. Intell. Syst. Technol., 3(3), May.
We depart slightly from the original FM formulation by including both positive definite and negative definite factors. While the positive definite factor can approximate any matrix in the limit, using both positive and negative definite factors should give better performance for a fixed number of factors. This is what we observed on several datasets. With both positive definite and negative definite factors, there should also be no need to remove diagonal terms, since the positive and negative factorizations already form a conventional eigendecomposition (a best least-squares fit for a given number of factors) of the matrix of second-order interactions.
The types of model are given by the values of opts.links (IMat) and are the same as for GLM models. They are:
Options are:
Inherited from Regression Model:
Some convenience functions for training:
val (mm, opts) = FM.learner(a, d) // On an input matrix a including targets (set opts.targets to specify them), // learns an FM model of type d. // returns the model (nn) and the options class (opts). val (mm, opts) = FM.learner(a, c, d) // On an input matrix a and target matrix c, learns an FM model of type d. // returns the model (nn) and the options class (opts). val (nn, nopts) = FM.predictor(model, ta, pc, d) // constructs a prediction learner from an existing model. returns the learner and options. // pc should be the same dims as the test label matrix, and will contain results after nn.predict val (mm, mopts, nn, nopts) = FM.learner(a, c, ta, pc, d) // a = training data, c = training labels, ta = test data, pc = prediction matrix, d = type. // returns a training learner mm, with options mopts. Also returns a prediction model nn with its own options. // typically set options, then do mm.train; nn.predict with results in pc. val (mm, opts) = learner(ds) // Build a learner for a general datasource ds (e.g. a files data source).
An Abstract class with shared code for Factor Models
Train a GLM model.
Train a GLM model. The types of model are given by the values of opts.links (IMat). They are:
Options are:
Inherited from Regression Model:
Some convenience functions for training:
val (mm, opts) = GLM.learner(a, d) // On an input matrix a including targets (set opts.targets to specify them), // learns a GLM model of type d. // returns the model (nn) and the options class (opts). val (mm, opts) = GLM.learner(a, c, d) // On an input matrix a and target matrix c, learns a GLM model of type d. // returns the model (nn) and the options class (opts). val (nn, nopts) = predictor(model, ta, pc, d) // constructs a prediction learner from an existing model. returns the learner and options. // pc should be the same dims as the test label matrix, and will contain results after nn.predict val (mm, mopts, nn, nopts) = GLM.learner(a, c, ta, pc, d) // a = training data, c = training labels, ta = test data, pc = prediction matrix, d = type. // returns a training learner mm, with options mopts. Also returns a prediction model nn with its own options. // typically set options, then do mm.train; nn.predict with results in pc. val (mm, opts) = learner(ds) // Build a learner for a general datasource ds (e.g. a files data source).
Independent Component Analysis, using FastICA.
Independent Component Analysis, using FastICA. It has the ability to center and whiten data. It is based on the method presented in:
A. Hyvärinen and E. Oja. Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5):411-430, 2000.
In particular, we provide the logcosh, exponential, and kurtosis "G" functions.
This algorithm computes the following modelmats array:
> modelmats(0) stores the inverse of the mixing matrix. If X = A*S represents the data, then it's the estimated A{-1}, which we assume is square and invertible for now. > modelmats(1) stores the mean vector of the data, which is computed entirely on the first pass. This means once we estimate A{-1} in modelmats(0), we need to first shift the data by this amount, and then multiply to recover the (centered) sources. Example:
modelmats(0) * (data - modelmats(1))
Here, data is an n x N matrix, whereas modelmats(1) is an n x 1 matrix. For efficiency reasons, we assume a constant batch size for each block of data so we take the mean across all batches. This is true except for (usually) the last batch, but this almost always isn't enough to make a difference.
Thus, modelmats(1) helps to center the data. The whitening in this algorithm happens during the updates to W in both the orthogonalization and the fixed point steps. The former uses the computed covariance matrix and the latter relies on an approximation of W^T*W to the inverse covariance matrix. It is fine if the data is already pre-whitened before being passed to BIDMach.
Currently, we are thinking about the following extensions:
> Allowing ICA to handle non-square mixing matrices. Most research about ICA assumes that A is n x n. > Improving the way we handle the computation of the mean, so it doesn't rely on the last batch being of similar size to all prior batches. Again, this is minor, especially for large data sets. > Thinking of ways to make this scale better to a large variety of datasets
For additional references, see Aapo Hyvärinen's other papers, and visit: http://research.ics.aalto.fi/ica/fastica/
KMeans
KMeans
val (nn, opts) = KMeans.learner(a) opts.what // prints the available options opts.dim=200 // customize options nn.train // rain the learner nn.modelmat // get the final model val (nn, opts) = KMeans.learnPar(a) // Build a parallel learner opts.nthreads=2 // number of threads (defaults to number of GPUs) nn.train // train the learner nn.modelmat // get the final model
KMeans
KMeans
val (nn, opts) = KMeansw.learner(a,w) val (nn, opts) = KMeansw.learner(a) a // input matrix w // optional weight matrix opts.what // prints the available options opts.dim=200 // customize options nn.train // train the learner nn.modelmat // get the final model val (nn, opts) = KMeansw.learnPar(a,w) // Build a parallel learner val (nn, opts) = KMeansw.learnPar(a) opts.nthreads=2 // number of threads (defaults to number of GPUs) nn.train // train the learner nn.modelmat // get the final model
LDA model using online Variational Bayes (Hoffman, Blei and Bach, 2010)
LDA model using online Variational Bayes (Hoffman, Blei and Bach, 2010)
Parameters
Other key parameters inherited from the learner, datasource and updater:
Example:
a is a sparse word x document matrix
val (nn, opts) = LDA.learner(a) opts.what // prints the available options opts.uiter=2 // customize options nn.train // train the model nn.modelmat // get the final model nn.datamat // get the other factor (requires opts.putBack=1) val (nn, opts) = LDA.learnPar(a) // Build a parallel learner opts.nthreads=2 // number of threads (defaults to number of GPUs) nn.train // train the model nn.modelmat // get the final model nn.datamat // get the other factor
Latent Dirichlet Model using repeated Gibbs sampling.
Latent Dirichlet Model using repeated Gibbs sampling.
Extends Factor Model Options with:
Other key parameters inherited from the learner, datasource and updater:
Example:
a is a sparse word x document matrix
val (nn, opts) = LDAgibbs.learn(a) opts.what // prints the available options opts.uiter=2 // customize options nn.run // run the learner nn.modelmat // get the final model nn.datamat // get the other factor (requires opts.putBack=1) val (nn, opts) = LDAgibbs.learnPar(a) // Build a parallel learner opts.nthreads = 2 // number of threads (defaults to number of GPUs) nn.run // run the learner nn.modelmat // get the final model nn.datamat // get the other factor
Latent Dirichlet Model using repeated Gibbs sampling.
Latent Dirichlet Model using repeated Gibbs sampling.
This version (v) supports per-model-element sample counts, e.g. for local heating or cooling of particular model coefficients.
Extends Factor Model Options with: - dim(256): Model dimension - uiter(5): Number of iterations on one block of data - alpha(0.001f) Dirichlet prior on document-topic weights - beta(0.0001f) Dirichlet prior on word-topic weights - nsamps(row(100)) matrix with the number of repeated samples to take
Other key parameters inherited from the learner, datasource and updater: - blockSize: the number of samples processed in a block - power(0.3f): the exponent of the moving average model' = a dmodel + (1-a)*model, a = 1/nblocks^power - npasses(10): number of complete passes over the dataset
Example:
a is a sparse word x document matrix
val (nn, opts) = LDAgibbs.learn(a) opts.what // prints the available options opts.uiter=2 // customize options nn.run // run the learner nn.modelmat // get the final model nn.datamat // get the other factor (requires opts.putBack=1) val (nn, opts) = LDAgibbs.learnPar(a) // Build a parallel learner opts.nthreads = 2 // number of threads (defaults to number of GPUs) nn.run // run the learner nn.modelmat // get the final model nn.datamat // get the other factor
Abstract class with shared code for all models
Non-negative Matrix Factorization (NMF) with L2 loss
Non-negative Matrix Factorization (NMF) with L2 loss
Parameters
Other key parameters inherited from the learner, datasource and updater:
Example:
a is a sparse word x document matrix
val (nn, opts) = NMF.learner(a) opts.what // prints the available options opts.uiter=2 // customize options nn.train // train the model nn.modelmat // get the final model nn.datamat // get the other factor (requires opts.putBack=1) val (nn, opts) = NMF.learnPar(a) // Build a parallel learner opts.nthreads=2 // number of threads (defaults to number of GPUs) nn.train // run the model nn.modelmat // get the final model nn.datamat // get the other factor
Random Forests.
Random Forests. Given a datasource of data and labels, compute a random classification or regression Forest.
* Options
NOTE: The algorithm uses a packed representation of the dataset statistics with fixed precision fields. Setting nbits selects how many bits to use from each input data. For integer data, the lower nbits are used. For floating point data, the leading nbits are used. So e.g. 16 float bits gives sign, 8 bits of exponent, and 7 bits of mantissa with a leading 1.
For regression, discrete (integer) target values should be used in the training data. The output will be continuous values interpolated from them.
Other key parameters inherited from the learner, datasource and updater:
Example:
a is an nfeats x ninstances data matrix, c is a 1 x ninstances vector of labels
val (nn, opts) = RandomForest.learner(a,c) opts.what // prints the available options opts.depth=25 // Set depth - something like log2(ninstances / 10) is good opts.ntrees=20 // Good starting value. Increasing this usually increases accuracy. opts.nsamps=30 // Typically sqrt(nfeats) is good. Larger values may work better. opts.nnodes // Bounded by 2^depth, but usually smaller than this. opts.ncats=10 // Its a good idea to set this - learner will try to guess it, but may get it wrong opts.nbits=10 // Number of bits to use from input data. nn.train // train the learner. nn.modelmats // get the final model (4 matrices)
Abstract class with shared code for Regression Models
Sparse Matrix Factorization with L2 loss (similar to ALS).
Sparse Matrix Factorization with L2 loss (similar to ALS).
Parameters
Other key parameters inherited from the learner, datasource and updater:
Example:
a is a sparse word x document matrix
val (nn, opts) = SFA.learner(a) opts.what // prints the available options opts.uiter=2 // customize options nn.train // train the model nn.modelmat // get the final model nn.datamat // get the other factor (requires opts.putBack=1)
Basic DNN class. Learns a supervised map from input blocks to output (target) data blocks. There are currently 4 layer types:
The network topology is specified by opts.layers which is a sequence of "LayerSpec" objects. There is a LayerSpec Class for each Layer class, which holds the params for defining that layer. Currently only two LayerSpec types need params: