zensols.model.weka

Wraps the Weka Java API. This is probably the wrong library to use for most uses. Instead take a look at zensols.model.eval-classifier and zensols.model.execute-classifier.

*classifiers*

dynamic

An (incomplete) set of Weka classifiers keyed by their speed, type or singleton by name.

  • fast train quickly
  • slow train slowly
  • really-slow train very very slowly
  • lazy lazy category
  • meta meta classifiers (i.e. boosting)
  • tree tree based classifiers (typically train quickly)

The singleton classifiers is a list like the others but have only a single element of the class. They include: zeror, svm, j48, random-forest, naivebays, logit, logitboost, smo, kstar.

*cross-fold-info*

dynamic

When two-pass cross fold validations are used this is bound to the following map during the validation (see clone-instances):

  • :train? true if creating folds during for the train phase, otherwise the test phase is used
  • :fold the number of the fold
  • :state state shared between training and testing (i.e. context)

*missing-values-ok*

dynamic

Whether missing the classifier can handle missing values, otherwise an exception is thrown for missing values.

append-instances

(append-instances src dst)

Merge two instances row wise by adding dst to src.

attribute-by-name

(attribute-by-name instances name)

Return a weka.core.Attribute instance by name from a weka.core.Instances.

attributes-for-instances

(attributes-for-instances insts & {:keys [sort?], :or {sort? true}})

Return a map with :name and :type for each attribute in an weka.core.Instances.

clone-classifier

(clone-classifier classifier)

clone-instances

(clone-instances inst & {:keys [train-fn test-fn randomize-fn], :as opts})

Return a deep clone of inst, optionally with a specific training and test set. See *cross-fold-info* to get information during the validation for debugging and analysis.

  • inst an (object) instance of weka.core.Instances (the whole dataset)

Keys

  • train-fn a function that takes the following arguments: an weka.core.Instances created for the training set, number of folds, the fold number and a java.util.Random to pass to the Weka layer to shuffle the dataset

  • test-fn just like train-fn but used to create the test data set and it doesn’t take the java.util.Random instance

create-attrib

(create-attrib att-name type)

Create a Weka Attribute instance with att-name.

type is the type of attribute, which can be string, boolean, numeric, or a sequence of strings representing possible enumeration values (nominals in Weka speak).

instances

(instances inst-name feature-sets feature-metas)(instances inst-name feature-sets feature-metas class-feature-meta & {:keys [clone?], :or {clone? true}})

Create a new weka.core.Instances instance.

  • inst-name used to identify the model data set
  • feature-sets a sequence of maps with each map having key/value pairs of the features of the model to be populated in the returned weka.core.Instances
  • feature-metas a map of key/value pairs describing the features (they become weka.core.Attributes) where the values are string, boolean, numeric, or a sequence of strings representing possible enumeration values (nominals in Weka speak)
  • class-feature-meta just like a (single) feature-metas but describes the class

let-classifier

macro

(let-classifier fdef-expr & forms)

fnspec ==> (classifier-name [insts] exprs)

Define a classifier that uses Clojure code to evaluate insts instances and evaluate body exprs.

Example:

(let-classifier
  (langid-baseline [inst]
     (let [attrib (weka/attribute-by-name inst "langid-1-id")
           val (.stringValue inst attrib)
           rval (= "en" val)]
       (log/infof "langid: %s for: %s: res: %s" val inst rval)
       (if rval 1 0)))
(terse-results lang-baseline meta-set))

make-classifiers

(make-classifiers)(make-classifiers set-name-or-instance)

Make classifiers from either a key in *classifiers* or an instance of weka.classifiers.Classifier (meaning an already constructed instance). All classifiers are returned for the 0-arg option.

populate-instances

(populate-instances insts feature-metas feature-sets)

Populate a weka.core.Instances instance Clojure data structures.

  • inst a weka.core.Instances that will be populated
  • feature-metas a map of key/value pairs describing the features (they become weka.core.Attributes) where the values are described as types in create-attrib
  • feature-sets a sequence of maps with each map having key/value pairs of the features of the model to be populated in the returned weka.core.Instances

remove-attributes

(remove-attributes inst attrib-names & {:keys [invert-selection?]})

Remove a set of attributes from inst (weka.core.Instances) by string (string) name.

sparse-instances

(sparse-instances maps dim & {:keys [pattern class-attribute-name instance-name add-class? default-value], :or {pattern "f%d", class-attribute-name "class", instance-name "inst", add-class? true}})

Create a sparse core.weka.Instance using a sequence of maps (map). The keys of the maps are the class with the values maps each with the key as the index and the value the weight. The dim parameter is the dimension of each instance.

Keys

  • :pattern a format using one integer as the index (default: f%d)
  • :class-attribute the name of the output class (values are given from the keys of maps)
  • instance-name the name of the Instance created object and defaults to inst
  • :add-class? if true add the class that comes from the key in maps
  • default-value if a double replace missing values not in th emap with this value, otherwise missing values will be used

value

(value insts n name)

Return the value for instance n in core.weka.Instance insts with attribute of name.

value-for-instance

(value-for-instance val)(value-for-instance type val)

Return a Java variable that plays nicely with the Weka framework. If no type is given it tries to determine the type on its own.

  • val is a Java primitive (wrapper)
  • type if given, is the type of val (see create-attrib)