zensols.model.classifier

A utility library that wraps Weka library. This library works with zensols.model.weka do the following: * Cross validate models * Manage and sort results (i.e. cross validations) * Train models * Read/write ARFF files

This namspace uses the resource location system to configure the location of files and output analysis files. For more information about the configuration specifics see model-read-resource and analysis-dir, which both use resource-path.

You probably don’t want to use this library directly. Please look at zensols.model.eval-classifier and zensols.model.execute-classifier.

*arff-file*

dynamic

File to read or write from for any operation regarding file system access to a/the ARFF file(s).

*best-result-criteria*

dynamic

Key used to sort results by their most optimal performance statistic. Valid values are: :accuracy, :wprecision, :wrecall, :wfmeasure, :kappa, :rmse

*class-feature-meta*

dynamic

The class feature metadata (see zensols.model.weka/create-attrib).

*classifier-class*

dynamic

Class name for the classifier used. This defaults to J48.

*create-classifier-fn*

dynamic

Function used to create a classifier. Takes as input a weka.core.Instances.

*cross-fold-count*

dynamic

The default number of folds to use during cross fold validation (see cmpile-results).

*cross-val-fns*

dynamic

If this is non-nil then two pass validation is used. This is a map with the following keys:

See zensols.model.eval-classifier/*two-pass-config*

*get-data-fn*

dynamic

A function that generates a weka.core.Instances for cross validation, training, etc.

*operation-write-instance-fns*

dynamic

A map with valus of functions that are called that return a java.util.File for an operation represented by the respective key. An ARFF file is created at the file location. The keys are one of:

  • :train-classifier called when the classifier is training a model
  • :test-classifier called when the classifier is testing a model

*output-class-feature-meta*

dynamic

Default attribute name for the predicted label.

*rand-fn*

dynamic

A function that returns a java.util.Random used to randomize the train/test dataset.

analysis-report-resource

(analysis-report-resource)

Return the model directory on the file system as defined by the :analysis-report. See namespace documentation on how to configure.

classifier-name

(classifier-name classifier-instance)

Return a decent human readable name of a classifier instance.

classify-instance

(classify-instance classifier unlabeled return-keys)

Make predictions for all instances.

  • classifier instance of weka.classifiers.Classifier
  • unlabeled contains feature set data with an empty class label as a weka.core.Instances
  • return-keys what data to return
  • :label the classified label
  • :distributions the probability distribution over the label

compile-results

(compile-results results)

Return an easier to use map of result data given from cross-validate-tests. The map returns all the performance statistics and:

  • :feature-metadata feature metadatas
  • :result weka.core.Evaluation instance
  • all-results a sorted list of weka.core.Evaluation instances

See cross-validate-tests for where the results data is created.

cross-validate-tests

(cross-validate-tests classifier attributes feature-metadata)

Run the cross validation for classifier and attributes (symbol set).

excel-results

(excel-results sheet-name-results out-file)

Save the results in Excel format.

excel-results-precision

dynamic

An integer specifying the length of the mantissa when creating the results spreadsheet in excel-results.

filter-attribute-data

(filter-attribute-data unfiltered attributes)

Create a filtered data set (weka.core.Instances) from unfiltered Instances. Paramater attributes is a set of string attribute names.

initialize

(initialize)

Initialize model resource locations.

This needs the system property clj.nlp.parse.model set to a directory that has the POS tagger model english-left3words-distsim.tagger(or whatever you configure in zensols.nlparse.stanford/create-context) in a directory called pos.

See the source documentation for more information.

model-exists?

(model-exists? name)

Return whether a the model exists with name.

See model-read-resource.

model-read-resource

(model-read-resource name)

Return a file pointing to model with name using the the :model-read resource path (see zensols.actioncli.resource/resource-path).

model-write-resource

(model-write-resource name)

Return a file pointing to model with name using the the :model-write resource path (see zensols.actioncli.resource/resource-path).

print-eval-results

(print-eval-results eval)

Print the results, confusion matrix and class details to standard out of a weka.core.Evalution.

print-results

(print-results results & {:keys [title]})

Print the results, confusion matrix and class details to standard out of a single or sequence of weka.core.Evalutions.

read-arff

(read-arff input-file)

Return a weka.core.Instances from an ARFF file.

read-model

(read-model name & {:keys [fail-if-not-exists?], :or {fail-if-not-exists? true}})

Get a saved model (classifier and attributes used). If name is a string, use model-read-resource to calculate the file name. Otherwise, it should be a file of where the model exists.

See model-read-resource.

Keys

  • :fail-if-not-exists? if true then throw an exception if the model file is missing

test-classifier

(test-classifier classifier attributes train-data test-data)

Test/evaluate classifier (weka.classifiers.Classifier).

train-classifier

(train-classifier classifier attributes)

Train classifier (weka.classifiers.Classifier).

train-test-classifier

(train-test-classifier classifier feature-meta-sets feature-metadata train-instances test-instances)

write-arff

(write-arff instances)

Write a weka.core.Instances to an ARFF file and return that file.

write-model

(write-model name model)

Get a saved model (classifier and attributes used). If name is a string, use model-write-resource to calculate the file name. Otherwise, it should be a file of where to write the model.

See model-read-resource