zensols.model.weka
Wraps the Weka Java API. This is probably the wrong library to use for most uses. Instead take a look at zensols.model.eval-classifier and zensols.model.execute-classifier.
*classifiers*
dynamic
An (incomplete) set of Weka classifiers keyed by their speed, type or singleton by name.
- fast train quickly
- slow train slowly
- really-slow train very very slowly
- lazy lazy category
- meta meta classifiers (i.e. boosting)
- tree tree based classifiers (typically train quickly)
The singleton classifiers is a list like the others but have only a single element of the class. They include: zeror
, svm
, j48
, random-forest
, naivebays
, logit
, logitboost
, smo
, kstar
.
*cross-fold-info*
dynamic
When two-pass cross fold validations are used this is bound to the following map during the validation (see clone-instances):
- :train?
true
if creating folds during for the train phase, otherwise the test phase is used - :fold the number of the fold
- :state state shared between training and testing (i.e. context)
*missing-values-ok*
dynamic
Whether missing the classifier can handle missing values, otherwise an exception is thrown for missing values.
append-instances
(append-instances src dst)
Merge two instances row wise by adding dst to src.
attribute-by-name
(attribute-by-name instances name)
Return a weka.core.Attribute
instance by name from a weka.core.Instances
.
attributes-for-instances
(attributes-for-instances insts & {:keys [sort?], :or {sort? true}})
Return a map with :name and :type for each attribute in an weka.core.Instances
.
clone-instances
(clone-instances inst & {:keys [train-fn test-fn randomize-fn], :as opts})
Return a deep clone of inst, optionally with a specific training and test set. See *cross-fold-info* to get information during the validation for debugging and analysis.
- inst an (object) instance of
weka.core.Instances
(the whole dataset)
Keys
-
train-fn a function that takes the following arguments: an
weka.core.Instances
created for the training set, number of folds, the fold number and ajava.util.Random
to pass to the Weka layer to shuffle the dataset -
test-fn just like train-fn but used to create the test data set and it doesn’t take the
java.util.Random
instance
create-attrib
(create-attrib att-name type)
Create a Weka Attribute instance with att-name.
type is the type of attribute, which can be string
, boolean
, numeric
, or a sequence of strings representing possible enumeration values (nominals in Weka speak).
instances
(instances inst-name feature-sets feature-metas)
(instances inst-name feature-sets feature-metas class-feature-meta & {:keys [clone?], :or {clone? true}})
Create a new weka.core.Instances
instance.
- inst-name used to identify the model data set
- feature-sets a sequence of maps with each map having key/value pairs of the features of the model to be populated in the returned
weka.core.Instances
- feature-metas a map of key/value pairs describing the features (they become
weka.core.Attribute
s) where the values arestring
,boolean
,numeric
, or a sequence of strings representing possible enumeration values (nominals in Weka speak) -
class-feature-meta just like a (single) feature-metas but describes the class
let-classifier
macro
(let-classifier fdef-expr & forms)
fnspec ==> (classifier-name [insts] exprs)
Define a classifier that uses Clojure code to evaluate insts instances and evaluate body exprs.
Example:
(let-classifier
(langid-baseline [inst]
(let [attrib (weka/attribute-by-name inst "langid-1-id")
val (.stringValue inst attrib)
rval (= "en" val)]
(log/infof "langid: %s for: %s: res: %s" val inst rval)
(if rval 1 0)))
(terse-results lang-baseline meta-set))
make-classifiers
(make-classifiers)
(make-classifiers set-name-or-instance)
Make classifiers from either a key in *classifiers* or an instance of weka.classifiers.Classifier
(meaning an already constructed instance). All classifiers are returned for the 0-arg option.
populate-instances
(populate-instances insts feature-metas feature-sets)
Populate a weka.core.Instances
instance Clojure data structures.
- inst a
weka.core.Instances
that will be populated - feature-metas a map of key/value pairs describing the features (they become
weka.core.Attribute
s) where the values are described as types in create-attrib - feature-sets a sequence of maps with each map having key/value pairs of the features of the model to be populated in the returned
weka.core.Instances
remove-attributes
(remove-attributes inst attrib-names & {:keys [invert-selection?]})
Remove a set of attributes from inst (weka.core.Instances
) by string (string) name.
sparse-instances
(sparse-instances maps dim & {:keys [pattern class-attribute-name instance-name add-class? default-value], :or {pattern "f%d", class-attribute-name "class", instance-name "inst", add-class? true}})
Create a sparse core.weka.Instance
using a sequence of maps (map
). The keys of the maps are the class with the values maps each with the key as the index and the value the weight. The dim
parameter is the dimension of each instance.
Keys
- :pattern a format using one integer as the index (default:
f%d
) - :class-attribute the name of the output class (values are given from the keys of
maps
) - instance-name the name of the
Instance
created object and defaults toinst
- :add-class? if
true
add the class that comes from the key in maps - default-value if a double replace missing values not in th emap with this value, otherwise missing values will be used
value
(value insts n name)
Return the value for instance n in core.weka.Instance
insts with attribute of name.
value-for-instance
(value-for-instance val)
(value-for-instance type val)
Return a Java variable that plays nicely with the Weka framework. If no type is given it tries to determine the type on its own.
- val is a Java primitive (wrapper)
- type if given, is the type of val (see create-attrib)