zensols.model.eval-classifier

A client entry point library to help with evaluating machine learning models. This library not only wraps the Weka library but also provides additional functionality like a two pass cross validation (see with-two-pass).

default-set-type

dynamic

The default type of test, which is one of:

:cross-validation: run a N fold cross validation (default)
:train-test: train the classifier and then evaluate

view source

throw-cross-validate

dynamic

If true, throw an exception during cross validation for any errors. Otherwise, the error is logged and cross-validation continues. This is useful for when classifiers are used and some choke given the dataset, but you still want the other results.

view source

analysis-file

(analysis-file)(analysis-file file-format)

view source

compile-results

(compile-results classifier-sets feature-set-key)

Run cross-fold validation and compile into a nice results map sorted by performance.

See zensols.model.classifier/compile-results.

classifier-sets is a key in zensols.model.weka/*classifiers* or a constructed classifier (see zensols.model.weka/make-classifiers)
feature-sets-key identifies what feature set (see :feature-sets-set in zensols.model.execute-classifier/with-model-conf)

view source

create-model

(create-model classifier-sets feature-set-key)

Create a model that can be trained. This runs cross fold validations to find the best classifier and feature set into a result that can be used with train-model and subsequently write-model.

classifier-sets is a key in zensols.model.weka/*classifiers* or a constructed classifier (see zensols.model.weka/make-classifiers)
feature-sets-key identifies what feature set (see :feature-sets-set in zensols.model.execute-classifier/with-model-conf)

See *throw-cross-validate*.

view source

cross-fold-info

(cross-fold-info)

Return information about the current fold for two-pass validations. See weka/*cross-fold-info*

view source

display-features

(display-features & adb-keys)

Display features as configured in a model with zensols.model.execute-classifier/with-model-conf.

adb-keys are given to :create-feature-sets-fn as described in zensols.model.execute-classifier/with-model-conf. In addition it includes :max, which is the maximum number of instances to display.

view source

eval-and-write

(eval-and-write classifier-sets set-key)(eval-and-write classifier-sets set-key file)

Perform a cross validation and write the results to an Excel formatted file.

See zensols.model.classifier/analysis-report-resource for where the file is written.

classifier-sets is a key in zensols.model.weka/*classifiers* or a constructed classifier (see zensols.model.weka/make-classifiers)
feature-sets-key identifies what feature set (see :feature-sets-set in zensols.model.execute-classifier/with-model-conf)

This uses eval-and-write-results to actually write the results.

See evaluations-file and *throw-cross-validate*.

view source

eval-and-write-results

(eval-and-write-results results)(eval-and-write-results results output-file)

Perform a cross validation and write the results to an Excel formatted file. The data from results is obtained with run-tests.

See eval-and-write and *throw-cross-validate*.

view source

evaluations-file

(evaluations-file)(evaluations-file fname)

Return the default file used to create an evaluations file with eval-and-write.

view source

executing-two-pass?

(executing-two-pass?)

Return true if we’re currently using a two pass cross validation.

view source

features-file

(features-file)

Return the default file used to create the features output file with write-features.

view source

print-best-results

(print-best-results classifier-sets feature-set-key)

Print the highest (best) scored cross validation information.

classifier-sets is a key in zensols.model.weka/*classifiers* or a constructed classifier (see zensols.model.weka/make-classifiers)
feature-sets-key identifies what feature set (see :feature-sets-set in zensols.model.execute-classifier/with-model-conf)

See *throw-cross-validate*.

view source

print-model-config

(print-model-config)

Pretty print the model configuation set with zensols.model.execute-classifier/with-model-conf.

view source

read-arff

(read-arff)(read-arff file)

Read the ARFF file configured with zensols.model.execute-classifier/with-model-conf. If file is given, use that file instead of getting it from zensols.model.classifier/analysis-report-resource.

view source

read-model

(read-model)

Read a model that was previously persisted to the file system.

See zensols.model.classifier/model-dir for where the model is read from.

view source

run-tests

(run-tests classifier-sets feature-set-key)

Create result sets useful to functions like eval-and-write. This package was designed for most use cases to not have to use this function.

See *throw-cross-validate*.

view source

terse-results

(terse-results classifier-sets feature-set-key & {:keys [only-stats?], :or {only-stats? true}})

Return terse cross-validation results in an array: * classifier name * weighted F-measure * feature-metas

classifier-sets is a key in zensols.model.weka/*classifiers* or a constructed classifier (see zensols.model.weka/make-classifiers)
feature-sets-key identifies what feature set (see :feature-sets-set in zensols.model.execute-classifier/with-model-conf)

Keys

:only-stats? if true only return statistic data

See *throw-cross-validate*.

view source

test-train-series-file

(test-train-series-file)(test-train-series-file fname)

Return the default file used to create an evaluations file with eval-and-write.

view source

train-model

(train-model model & {:keys [set-type], :or {set-type *default-set-type*}})

Train a model created from create-model. The model is trained on the full available dataset. After the classifier is trained, you can save it to disk by calling write-model.

model a model that was created with create-model

See *throw-cross-validate*.

view source

train-test-results

(train-test-results classifier-sets feature-sets-key)

Test the performance of a model by training on a given set of data and evaluate on the test data.

See train-model for parameter details.

view source

train-test-series

(train-test-series classifiers meta-set divide-ratio-config)

Test and train with different rations and return the results. The return data is writable directly as an Excel file. However, you can also save it as a CSV with write-csv-train-test-series.

The keys are the classifier name and the values are the 2D result matrix.

See *throw-cross-validate*.

view source

two-pass-model

(two-pass-model model id-key anon-by-id-fn anons-fn)

Don’t use this function–instead, use with-two-pass.

Create a two pass model, which should be merged with the model created with zensols.model.execute-classifier/with-model-conf.

See with-two-pass.

view source

two-pass-test-instances

(two-pass-test-instances insts train-state org folds fold)

Don’t use this function–instead, use with-two-pass.

This is called by the zensols.model.weka namespace.

view source

two-pass-train-instances

(two-pass-train-instances insts state org folds fold)

Don’t use this function–instead, use with-two-pass.

This is called by the zensols.model.weka namespace.

view source

with-two-pass

macro

(with-two-pass model-conf opts & forms)

Like with-model-conf, but compute a context state (i.e. statistics needed by the model) on a per fold when executing a cross fold validation.

The model-conf parameter is the same model used with zensols.model.execute-classifier/with-model-conf.

Description

Two pass validation is a term used in this library. During cross-validation the entire data set is evaluated and (usually) statistics or some other additional modeling happens.

Take for example you want to count words (think Naive Bays spam filter). If create features for the entire dataset before cross-validation you’re “cheating” because the features are based on data not seen from the test folds.

To get more accurate performance metrics you can provide functions that takes the current training fold, compute your word counts and create your features. During the testing phase, the computed data is provided to create features based on only that (current) fold.

To use two pass validation ever feature set needs a unique key (not needed as a feature). This key is then given to a function during validation to get the corresponding feature set that is to be stitched in.

Note This is only useful if:

You want to use cross fold validation to test your model.
Your model priors (*context* in implementation parlance) is composed of the dataset preproessing, and thus, needed to get reliable performance metrics.

Option Keys

In addition to all keys documented in zensols.model.execute-classifier/with-model-conf, the opts param is a map that also needs the following key/value pairs:

:id-key a function that takes a key as input and returns a feature set
:anon-by-id-fn is a function that takes a single integer argument of the annotation to retrieve by ID
:anons-fn is a function that retrieves all annotations
:create-two-pass-context-fn like :create-context-fn, as documented in zensols.model.execute-classifier/with-model-conf but called for two pass cross validation; this allows a more general context and a specific two pass context to be created for the unique needs of the model.

Example

(with-two-pass (create-model-config)
  {:id-key sf/id-key
   :anon-by-id-fn #(->> % adb/anon-by-id :instance)
   :anons-fn adb/anons}
(with-feature-context (sf/create-context :anons-fn adb/anons
                                         :set-type :train-test)
  (ec/terse-results [:j48] :set-test-two-pass :only-stats? true)))

See a working example for a more comprehensive code listing.

view source

write-arff

(write-arff)(write-arff file)

Write the ARFF file configured with zensols.model.execute-classifier/with-model-conf. If file is given, use that file instead of getting it from zensols.model.classifier/analysis-report-resource.

view source

write-csv-train-test-series

(write-csv-train-test-series res)(write-csv-train-test-series res out-file)

Write the results produced with train-test-series as a CSV file to the analysis directory.

view source

write-features

(write-features)(write-features file)

Write features as configured in a model with zensols.model.execute-classifier/with-model-conf to a CSV spreadsheet file.

See features-file for the default file

For the non-zero-arg form, see zensols.model.execute-classifier/with-model-conf.

view source

write-model

(write-model model)(write-model model name)

Persist/write the model to disk.

model a model that was trained with train-model

See zensols.model.classifier/model-dir for information about to where the model is written.

view source

Generated by Codox

Interface for machine learning modeling, testing and training 0.0.18

Project

Namespaces

Public Vars

zensols.model.eval-classifier

*default-set-type*

dynamic

*throw-cross-validate*

dynamic

analysis-file

compile-results

create-model

cross-fold-info

display-features

eval-and-write

eval-and-write-results

evaluations-file

executing-two-pass?

features-file

print-best-results

print-model-config

read-arff

read-model

run-tests

terse-results

Keys

test-train-series-file

train-model

train-test-results

train-test-series

two-pass-model

two-pass-test-instances

two-pass-train-instances

with-two-pass

macro

Description

Option Keys

Example

write-arff

write-csv-train-test-series

write-features

write-model

default-set-type

throw-cross-validate