BIDMach

models

package models

Visibility
  1. Public
  2. All

Type Members

  1. class BayesNet extends Model

    WIP: Bayes network using cooled Gibbs parameter estimation

  2. abstract class ClusteringModel extends Model

    An abstract class with shared code for Clustering Models

  3. class DNN extends Model

    Basic DNN class.

    Basic DNN class. Learns a supervised map from input blocks to output (target) data blocks. There are currently 4 layer types:

    • InputLayer: just a placeholder for the first layer which is loaded with input data blocks. No learnable params.
    • FCLayer: Fully-Connected Linear layer. Has a matrix of learnable params which is the input-output map.
    • RectLayer: Rectifying one-to-one layer. No params.
    • GLMLayer: a one-to-one layer with GLM mappings (linear, logistic, abs-logistic and SVM). No learnable params.

    The network topology is specified by opts.layers which is a sequence of "LayerSpec" objects. There is a LayerSpec Class for each Layer class, which holds the params for defining that layer. Currently only two LayerSpec types need params:

    • FC: holds the output dimensions of the FClayer (input dimension set by previous layer).
    • GLM: holds the links matrix (integer specs for loss types, see GLM), for the output of that layer. Its size should match the number of targets.
  4. class FM extends RegressionModel

    Factorization Machine Model.

    Factorization Machine Model. This class computes a factorization machine model a la

    Steffen Rendle (2012): Factorization Machines with libFM, in ACM Trans. Intell. Syst. Technol., 3(3), May.

    We depart slightly from the original FM formulation by including both positive definite and negative definite factors. While the positive definite factor can approximate any matrix in the limit, using both positive and negative definite factors should give better performance for a fixed number of factors. This is what we observed on several datasets. With both positive definite and negative definite factors, there should also be no need to remove diagonal terms, since the positive and negative factorizations already form a conventional eigendecomposition (a best least-squares fit for a given number of factors) of the matrix of second-order interactions.

    The types of model are given by the values of opts.links (IMat) and are the same as for GLM models. They are:

    • 0 = linear model (squared loss)
    • 1 = logistic model (logistic loss)
    • 2 = absolute logistic (hinge loss on logistic prediction)
    • 3 = SVM model (hinge loss)

    Options are:

    • links: an IMat whose nrows should equal the number of targets. Values as above. Can be different for different targets.
    • iweight: an FMat typically used to select a weight row from the input. i.e. iweight = 0,1,0,0,0 uses the second row of input data as weights to be applied to input samples. The iweight field should be 0 in mask.
    • dim1: Dimension of the positive definite factor
    • dim2: Dimension of the negative definite factor
    • strictFM: the exact FM model zeros the diagonal terms of the factorization. As mentioned above, this probably isn't needed in our version of the model, but its available.

    Inherited from Regression Model:

    • rmask: FMat, optional, 0-1-valued. Used to ignore certain input rows (which are targets or weights). Zero value in an element will ignore the corresponding row.
    • targets: FMat, optional, 0-1-valued. ntargs x nfeats. Used to specify which input features corresponding to targets.
    • targmap: FMat, optional, 0-1-valued. nntargs x ntargs. Used to replicate actual targets, e.g. to train multiple models (usually with different parameters) for the same target.

    Some convenience functions for training:

    val (mm, opts) = FM.learner(a, d)     // On an input matrix a including targets (set opts.targets to specify them),
                                          // learns an FM model of type d.
                                          // returns the model (nn) and the options class (opts).
    val (mm, opts) = FM.learner(a, c, d)  // On an input matrix a and target matrix c, learns an FM model of type d.
                                          // returns the model (nn) and the options class (opts).
    val (nn, nopts) = FM.predictor(model, ta, pc, d) // constructs a prediction learner from an existing model. returns the learner and options.
                                          // pc should be the same dims as the test label matrix, and will contain results after nn.predict
    val (mm, mopts, nn, nopts) = FM.learner(a, c, ta, pc, d) // a = training data, c = training labels, ta = test data, pc = prediction matrix, d = type.
                                          // returns a training learner mm, with options mopts. Also returns a prediction model nn with its own options.
                                          // typically set options, then do mm.train; nn.predict with results in pc.
    val (mm, opts) = learner(ds)          // Build a learner for a general datasource ds (e.g. a files data source).
  5. abstract class FactorModel extends Model

    An Abstract class with shared code for Factor Models

  6. class GLM extends RegressionModel

    Train a GLM model.

    Train a GLM model. The types of model are given by the values of opts.links (IMat). They are:

    • 0 = linear model (squared loss)
    • 1 = logistic model (logistic loss)
    • 2 = hinge logistic (hinge loss on logistic prediction)
    • 3 = SVM model (hinge loss)

    Options are:

    • links: an IMat whose nrows should equal the number of targets. Values as above. Can be different for different targets.
    • iweight: an FMat typically used to select a weight row from the input. i.e. iweight = 0,1,0,0,0 uses the second row of input data as weights to be applied to input samples. The iweight field should be 0 in mask.

    Inherited from Regression Model:

    • rmask: FMat, optional, 0-1-valued. Used to ignore certain input rows (which are targets or weights). Zero value in an element will ignore the corresponding row.
    • targets: FMat, optional, 0-1-valued. ntargs x nfeats. Used to specify which input features corresponding to targets.
    • targmap: FMat, optional, 0-1-valued. nntargs x ntargs. Used to replicate actual targets, e.g. to train multiple models (usually with different parameters) for the same target.

    Some convenience functions for training:

    val (mm, opts) = GLM.learner(a, d)    // On an input matrix a including targets (set opts.targets to specify them),
                                          // learns a GLM model of type d.
                                          // returns the model (nn) and the options class (opts).
    val (mm, opts) = GLM.learner(a, c, d) // On an input matrix a and target matrix c, learns a GLM model of type d.
                                          // returns the model (nn) and the options class (opts).
    val (nn, nopts) = predictor(model, ta, pc, d) // constructs a prediction learner from an existing model. returns the learner and options.
                                          // pc should be the same dims as the test label matrix, and will contain results after nn.predict
    val (mm, mopts, nn, nopts) = GLM.learner(a, c, ta, pc, d) // a = training data, c = training labels, ta = test data, pc = prediction matrix, d = type.
                                          // returns a training learner mm, with options mopts. Also returns a prediction model nn with its own options.
                                          // typically set options, then do mm.train; nn.predict with results in pc.
    val (mm, opts) = learner(ds)          // Build a learner for a general datasource ds (e.g. a files data source).
  7. class Graph extends AnyRef

  8. class ICA extends FactorModel

    Independent Component Analysis, using FastICA.

    Independent Component Analysis, using FastICA. It has the ability to center and whiten data. It is based on the method presented in:

    A. Hyvärinen and E. Oja. Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5):411-430, 2000.

    In particular, we provide the logcosh, exponential, and kurtosis "G" functions.

    This algorithm computes the following modelmats array:

    > modelmats(0) stores the inverse of the mixing matrix. If X = A*S represents the data, then it's the estimated A{-1}, which we assume is square and invertible for now. > modelmats(1) stores the mean vector of the data, which is computed entirely on the first pass. This means once we estimate A{-1} in modelmats(0), we need to first shift the data by this amount, and then multiply to recover the (centered) sources. Example:

    modelmats(0) * (data - modelmats(1))

    Here, data is an n x N matrix, whereas modelmats(1) is an n x 1 matrix. For efficiency reasons, we assume a constant batch size for each block of data so we take the mean across all batches. This is true except for (usually) the last batch, but this almost always isn't enough to make a difference.

    Thus, modelmats(1) helps to center the data. The whitening in this algorithm happens during the updates to W in both the orthogonalization and the fixed point steps. The former uses the computed covariance matrix and the latter relies on an approximation of W^T*W to the inverse covariance matrix. It is fine if the data is already pre-whitened before being passed to BIDMach.

    Currently, we are thinking about the following extensions:

    > Allowing ICA to handle non-square mixing matrices. Most research about ICA assumes that A is n x n. > Improving the way we handle the computation of the mean, so it doesn't rely on the last batch being of similar size to all prior batches. Again, this is minor, especially for large data sets. > Thinking of ways to make this scale better to a large variety of datasets

    For additional references, see Aapo Hyvärinen's other papers, and visit: http://research.ics.aalto.fi/ica/fastica/

  9. class KMeans extends ClusteringModel

    KMeans

    KMeans

    val (nn, opts) = KMeans.learner(a)
    opts.what             // prints the available options
    opts.dim=200          // customize options
    nn.train              // rain the learner
    nn.modelmat           // get the final model
    
    val (nn, opts) = KMeans.learnPar(a) // Build a parallel learner
    opts.nthreads=2       // number of threads (defaults to number of GPUs)
    nn.train              // train the learner
    nn.modelmat           // get the final model
  10. class KMeansw extends Model

    KMeans

    KMeans

    val (nn, opts) = KMeansw.learner(a,w)
    val (nn, opts) = KMeansw.learner(a)
    a                     // input matrix
    w                     // optional weight matrix
    opts.what             // prints the available options
    opts.dim=200          // customize options
    nn.train              // train the learner
    nn.modelmat           // get the final model
    
    val (nn, opts) = KMeansw.learnPar(a,w) // Build a parallel learner
    val (nn, opts) = KMeansw.learnPar(a)
    opts.nthreads=2       // number of threads (defaults to number of GPUs)
    nn.train              // train the learner
    nn.modelmat           // get the final model
  11. class LDA extends FactorModel

    LDA model using online Variational Bayes (Hoffman, Blei and Bach, 2010)

    LDA model using online Variational Bayes (Hoffman, Blei and Bach, 2010)

    Parameters

    • dim(256): Model dimension
    • uiter(5): Number of iterations on one block of data
    • alpha(0.001f): Dirichlet document-topic prior
    • beta(0.0001f): Dirichlet word-topic prior
    • exppsi(true): Apply exp(psi(X)) if true, otherwise just use X
    • LDAeps(1e-9): A safety floor constant

    Other key parameters inherited from the learner, datasource and updater:

    • blockSize: the number of samples processed in a block
    • power(0.3f): the exponent of the moving average model' = a dmodel + (1-a)*model, a = 1/nblocks^power
    • npasses(10): number of complete passes over the dataset

    Example:

    a is a sparse word x document matrix

    val (nn, opts) = LDA.learner(a)
    opts.what             // prints the available options
    opts.uiter=2          // customize options
    nn.train              // train the model
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor (requires opts.putBack=1)
    
    val (nn, opts) = LDA.learnPar(a) // Build a parallel learner
    opts.nthreads=2       // number of threads (defaults to number of GPUs)
    nn.train              // train the model
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor
  12. class LDAgibbs extends FactorModel

    Latent Dirichlet Model using repeated Gibbs sampling.

    Latent Dirichlet Model using repeated Gibbs sampling.

    Extends Factor Model Options with:

    • dim(256): Model dimension
    • uiter(5): Number of iterations on one block of data
    • alpha(0.001f) Dirichlet prior on document-topic weights
    • beta(0.0001f) Dirichlet prior on word-topic weights
    • nsamps(100) the number of repeated samples to take
    • useBino(false): use poisson (default) or binomial sampling (if true)

    Other key parameters inherited from the learner, datasource and updater:

    • batchSize: the number of samples processed in a block
    • power(0.3f): the exponent of the moving average model' = a dmodel + (1-a)*model, a = 1/nblocks^power
    • npasses(10): number of complete passes over the dataset

    Example:

    a is a sparse word x document matrix

    val (nn, opts) = LDAgibbs.learn(a)
    opts.what             // prints the available options
    opts.uiter=2          // customize options
    nn.run                // run the learner
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor (requires opts.putBack=1)
    
    val (nn, opts) = LDAgibbs.learnPar(a) // Build a parallel learner
    opts.nthreads = 2     // number of threads (defaults to number of GPUs)
    nn.run                // run the learner
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor
  13. class LDAgibbsv extends FactorModel

    Latent Dirichlet Model using repeated Gibbs sampling.

    Latent Dirichlet Model using repeated Gibbs sampling.

    This version (v) supports per-model-element sample counts, e.g. for local heating or cooling of particular model coefficients.

    Extends Factor Model Options with: - dim(256): Model dimension - uiter(5): Number of iterations on one block of data - alpha(0.001f) Dirichlet prior on document-topic weights - beta(0.0001f) Dirichlet prior on word-topic weights - nsamps(row(100)) matrix with the number of repeated samples to take

    Other key parameters inherited from the learner, datasource and updater: - blockSize: the number of samples processed in a block - power(0.3f): the exponent of the moving average model' = a dmodel + (1-a)*model, a = 1/nblocks^power - npasses(10): number of complete passes over the dataset

    Example:

    a is a sparse word x document matrix

    val (nn, opts) = LDAgibbs.learn(a)
    opts.what // prints the available options
    opts.uiter=2 // customize options
    nn.run // run the learner
    nn.modelmat // get the final model
    nn.datamat // get the other factor (requires opts.putBack=1)
    
    val (nn, opts) = LDAgibbs.learnPar(a) // Build a parallel learner
    opts.nthreads = 2 // number of threads (defaults to number of GPUs)
    nn.run // run the learner
    nn.modelmat // get the final model
    nn.datamat // get the other factor
  14. abstract class Model extends AnyRef

    Abstract class with shared code for all models

  15. class NMF extends FactorModel

    Non-negative Matrix Factorization (NMF) with L2 loss

    Non-negative Matrix Factorization (NMF) with L2 loss

    Parameters

    • dim(256): Model dimension
    • uiter(5): Number of iterations on one block of data
    • uprior: Prior on the user (data) factor
    • mprior: Prior on the model
    • NMFeps(1e-9): A safety floor constant

    Other key parameters inherited from the learner, datasource and updater:

    • batchSize: the number of samples processed in a block
    • power(0.3f): the exponent of the moving average model' = a dmodel + (1-a)*model, a = 1/nblocks^power
    • npasses(2): number of complete passes over the dataset

    Example:

    a is a sparse word x document matrix

    val (nn, opts) = NMF.learner(a)
    opts.what             // prints the available options
    opts.uiter=2          // customize options
    nn.train              // train the model
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor (requires opts.putBack=1)
    
    val (nn, opts) = NMF.learnPar(a) // Build a parallel learner
    opts.nthreads=2       // number of threads (defaults to number of GPUs)
    nn.train              // run the model
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor
  16. class RandomForest extends Model

    Random Forests.

    Random Forests. Given a datasource of data and labels, compute a random classification or regression Forest.

    * Options

    • depth(20): Bound on the tree depth, also the number of passes over the dataset.
    • ntrees(20): Number of trees in the Forest.
    • nsamps(32): Number of random features to try to split each node.
    • nnodes(200000): Bound on the size of each tree (number of nodes).
    • nbits(16): Number of bits to use for feature values.
    • gain(0.01f): Lower bound on impurity gain in order to split a node.
    • catsPerSample(1f): Number of cats per sample for multilabel classification.
    • ncats(0): Number of cats or regression values. 0 means guess from datasource.
    • training(true): Run for training (true) or prediction (false)
    • impurity(0): Impurity type, 0=entropy, 1=Gini
    • regression(false): Build a regression Forest (true) or classification Forest (false).
    • seed(1): Random seed for selecting features. Use this to train distinct Forests in multiple runs.
    • useIfeats(false): An internal var, when true use explicit feature indices vs compute them.
    • MAE(true): true=Use Mean Absolute Error when reporting performance vs. false=Mean Squared Error
    • trace(0): level of debugging information to print (0,1,2).

    NOTE: The algorithm uses a packed representation of the dataset statistics with fixed precision fields. Setting nbits selects how many bits to use from each input data. For integer data, the lower nbits are used. For floating point data, the leading nbits are used. So e.g. 16 float bits gives sign, 8 bits of exponent, and 7 bits of mantissa with a leading 1.

    For regression, discrete (integer) target values should be used in the training data. The output will be continuous values interpolated from them.

    Other key parameters inherited from the learner, datasource and updater:

    • batchSize(10000): The number of samples processed in a block
    • putBack(-1): Whether to put predictions back into the datasource target. Should be 1 for prediction.
    • useGPU(true): Use GPU acceleration if available

    Example:

    a is an nfeats x ninstances data matrix, c is a 1 x ninstances vector of labels

    val (nn, opts) = RandomForest.learner(a,c)
    opts.what             // prints the available options
    opts.depth=25         // Set depth - something like log2(ninstances / 10) is good
    opts.ntrees=20        // Good starting value. Increasing this usually increases accuracy.
    opts.nsamps=30        // Typically sqrt(nfeats) is good. Larger values may work better.
    opts.nnodes           // Bounded by 2^depth, but usually smaller than this.
    opts.ncats=10         // Its a good idea to set this - learner will try to guess it, but may get it wrong
    opts.nbits=10         // Number of bits to use from input data.
    nn.train              // train the learner.
    nn.modelmats          // get the final model (4 matrices)
  17. abstract class RegressionModel extends Model

    Abstract class with shared code for Regression Models

  18. class SFA extends FactorModel

    Sparse Matrix Factorization with L2 loss (similar to ALS).

    Sparse Matrix Factorization with L2 loss (similar to ALS).

    Parameters

    • dim(256): Model dimension
    • uiter(5): Number of iterations on one block of data
    • miter(5): Number of CG iterations for model updates - not currently used in the SGD implementation.
    • lambdau(5f): Prior on the user (data) factor
    • lambdam(5f): Prior on model
    • regumean(0f): prior on instance mean
    • regmmean(0f): Prior on feature mean
    • startup(1): Skip CG for this many iterations
    • traceConvergence(false): Print out trace info for convergence of the u iterations.
    • doUser(false): Apply the per-instance mean estimate.
    • weightByUser(false): Weight loss equally by users, rather than their number of choices.
    • ueps(1e-10f): A safety floor constant
    • uconvg(1e-3f): Stop u iteration if error smaller than this.

    Other key parameters inherited from the learner, datasource and updater:

    • batchSize: the number of samples processed in a block
    • npasses(2): number of complete passes over the dataset
    • useGPU(true): Use GPU acceleration if available.

    Example:

    a is a sparse word x document matrix

    val (nn, opts) = SFA.learner(a)
    opts.what             // prints the available options
    opts.uiter=2          // customize options
    nn.train              // train the model
    nn.modelmat           // get the final model
    nn.datamat            // get the other factor (requires opts.putBack=1)
  19. class SVTree extends AnyRef

  20. class SVec extends AnyRef

Value Members

  1. object BayesNet

  2. object ClusteringModel

  3. object DNN

  4. object FM

  5. object FactorModel

  6. object GLM

  7. object ICA

  8. object KMeans

  9. object KMeansw

  10. object LDA

  11. object LDAgibbs

  12. object LDAgibbsv

  13. object Model

  14. object NMF

  15. object RandomForest

  16. object RegressionModel

  17. object SFA

  18. object SVec

Ungrouped