zensols.model.eval-classifier

A client entry point library to help with evaluating machine learning models. This library not only wraps the Weka library but also provides additional functionality like a two pass cross validation (see with-two-pass).

*default-set-type*

dynamic

The default type of test, which is one of:

  • :cross-validation: run a N fold cross validation (default)
  • :train-test: train the classifier and then evaluate

*throw-cross-validate*

dynamic

If true, throw an exception during cross validation for any errors. Otherwise, the error is logged and cross-validation continues. This is useful for when classifiers are used and some choke given the dataset, but you still want the other results.

analysis-file

(analysis-file)(analysis-file file-format)

compile-results

(compile-results classifier-sets feature-set-key)

Run cross-fold validation and compile into a nice results map sorted by performance.

See zensols.model.classifier/compile-results.

create-model

(create-model classifier-sets feature-set-key)

Create a model that can be trained. This runs cross fold validations to find the best classifier and feature set into a result that can be used with train-model and subsequently write-model.

See *throw-cross-validate*.

cross-fold-info

(cross-fold-info)

Return information about the current fold for two-pass validations. See weka/*cross-fold-info*

display-features

(display-features & adb-keys)

Display features as configured in a model with zensols.model.execute-classifier/with-model-conf.

adb-keys are given to :create-feature-sets-fn as described in zensols.model.execute-classifier/with-model-conf. In addition it includes :max, which is the maximum number of instances to display.

eval-and-write

(eval-and-write classifier-sets set-key)(eval-and-write classifier-sets set-key file)

Perform a cross validation and write the results to an Excel formatted file.

See zensols.model.classifier/analysis-report-resource for where the file is written.

This uses eval-and-write-results to actually write the results.

See evaluations-file and *throw-cross-validate*.

eval-and-write-results

(eval-and-write-results results)(eval-and-write-results results output-file)

Perform a cross validation and write the results to an Excel formatted file. The data from results is obtained with run-tests.

See eval-and-write and *throw-cross-validate*.

evaluations-file

(evaluations-file)(evaluations-file fname)

Return the default file used to create an evaluations file with eval-and-write.

executing-two-pass?

(executing-two-pass?)

Return true if we’re currently using a two pass cross validation.

features-file

(features-file)

Return the default file used to create the features output file with write-features.

print-best-results

(print-best-results classifier-sets feature-set-key)

Print the highest (best) scored cross validation information.

See *throw-cross-validate*.

print-model-config

(print-model-config)

Pretty print the model configuation set with zensols.model.execute-classifier/with-model-conf.

read-arff

(read-arff)(read-arff file)

Read the ARFF file configured with zensols.model.execute-classifier/with-model-conf. If file is given, use that file instead of getting it from zensols.model.classifier/analysis-report-resource.

read-model

(read-model)

Read a model that was previously persisted to the file system.

See zensols.model.classifier/model-dir for where the model is read from.

run-tests

(run-tests classifier-sets feature-set-key)

Create result sets useful to functions like eval-and-write. This package was designed for most use cases to not have to use this function.

See *throw-cross-validate*.

terse-results

(terse-results classifier-sets feature-set-key & {:keys [only-stats?], :or {only-stats? true}})

Return terse cross-validation results in an array: * classifier name * weighted F-measure * feature-metas

Keys

  • :only-stats? if true only return statistic data

See *throw-cross-validate*.

test-train-series-file

(test-train-series-file)(test-train-series-file fname)

Return the default file used to create an evaluations file with eval-and-write.

train-model

(train-model model & {:keys [set-type], :or {set-type *default-set-type*}})

Train a model created from create-model. The model is trained on the full available dataset. After the classifier is trained, you can save it to disk by calling write-model.

See *throw-cross-validate*.

train-test-results

(train-test-results classifier-sets feature-sets-key)

Test the performance of a model by training on a given set of data and evaluate on the test data.

See train-model for parameter details.

train-test-series

(train-test-series classifiers meta-set divide-ratio-config)

Test and train with different rations and return the results. The return data is writable directly as an Excel file. However, you can also save it as a CSV with write-csv-train-test-series.

The keys are the classifier name and the values are the 2D result matrix.

See *throw-cross-validate*.

two-pass-model

(two-pass-model model id-key anon-by-id-fn anons-fn)

Don’t use this function–instead, use with-two-pass.

Create a two pass model, which should be merged with the model created with zensols.model.execute-classifier/with-model-conf.

See with-two-pass.

two-pass-test-instances

(two-pass-test-instances insts train-state org folds fold)

Don’t use this function–instead, use with-two-pass.

This is called by the zensols.model.weka namespace.

two-pass-train-instances

(two-pass-train-instances insts state org folds fold)

Don’t use this function–instead, use with-two-pass.

This is called by the zensols.model.weka namespace.

with-two-pass

macro

(with-two-pass model-conf opts & forms)

Like with-model-conf, but compute a context state (i.e. statistics needed by the model) on a per fold when executing a cross fold validation.

The model-conf parameter is the same model used with zensols.model.execute-classifier/with-model-conf.

Description

Two pass validation is a term used in this library. During cross-validation the entire data set is evaluated and (usually) statistics or some other additional modeling happens.

Take for example you want to count words (think Naive Bays spam filter). If create features for the entire dataset before cross-validation you’re “cheating” because the features are based on data not seen from the test folds.

To get more accurate performance metrics you can provide functions that takes the current training fold, compute your word counts and create your features. During the testing phase, the computed data is provided to create features based on only that (current) fold.

To use two pass validation ever feature set needs a unique key (not needed as a feature). This key is then given to a function during validation to get the corresponding feature set that is to be stitched in.

Note This is only useful if:

  1. You want to use cross fold validation to test your model.
  2. Your model priors (*context* in implementation parlance) is composed of the dataset preproessing, and thus, needed to get reliable performance metrics.

Option Keys

In addition to all keys documented in zensols.model.execute-classifier/with-model-conf, the opts param is a map that also needs the following key/value pairs:

  • :id-key a function that takes a key as input and returns a feature set
  • :anon-by-id-fn is a function that takes a single integer argument of the annotation to retrieve by ID
  • :anons-fn is a function that retrieves all annotations
  • :create-two-pass-context-fn like :create-context-fn, as documented in zensols.model.execute-classifier/with-model-conf but called for two pass cross validation; this allows a more general context and a specific two pass context to be created for the unique needs of the model.

Example

(with-two-pass (create-model-config)
  {:id-key sf/id-key
   :anon-by-id-fn #(->> % adb/anon-by-id :instance)
   :anons-fn adb/anons}
(with-feature-context (sf/create-context :anons-fn adb/anons
                                         :set-type :train-test)
  (ec/terse-results [:j48] :set-test-two-pass :only-stats? true)))

See a working example for a more comprehensive code listing.

write-arff

(write-arff)(write-arff file)

Write the ARFF file configured with zensols.model.execute-classifier/with-model-conf. If file is given, use that file instead of getting it from zensols.model.classifier/analysis-report-resource.

write-csv-train-test-series

(write-csv-train-test-series res)(write-csv-train-test-series res out-file)

Write the results produced with train-test-series as a CSV file to the analysis directory.

write-features

(write-features)(write-features file)

Write features as configured in a model with zensols.model.execute-classifier/with-model-conf to a CSV spreadsheet file.

See features-file for the default file

For the non-zero-arg form, see zensols.model.execute-classifier/with-model-conf.

write-model

(write-model model)(write-model model name)

Persist/write the model to disk.

See zensols.model.classifier/model-dir for information about to where the model is written.