zensols.model.classifier
A utility library that wraps Weka library. This library works with zensols.model.weka do the following: * Cross validate models * Manage and sort results (i.e. cross validations) * Train models * Read/write ARFF files
This namspace uses the resource location system to configure the location of files and output analysis files. For more information about the configuration specifics see model-read-resource and analysis-dir, which both use resource-path.
You probably don’t want to use this library directly. Please look at zensols.model.eval-classifier and zensols.model.execute-classifier.
*arff-file*
dynamic
File to read or write from for any operation regarding file system access to a/the ARFF file(s).
*best-result-criteria*
dynamic
Key used to sort results by their most optimal performance statistic. Valid values are: :accuracy
, :wprecision
, :wrecall
, :wfmeasure
, :kappa
, :rmse
*create-classifier-fn*
dynamic
Function used to create a classifier. Takes as input a weka.core.Instances
.
*cross-fold-count*
dynamic
The default number of folds to use during cross fold validation (see cmpile-results).
*cross-val-fns*
dynamic
If this is non-nil
then two pass validation is used. This is a map with the following keys:
-
:train-fn a function that is called during training for each fold to stitch in partial feature-sets to get better results; almost always set to zensols.model.eval-classifier/two-pass-train-instances
-
:test-fn just like :train-fn but called during testing; almost always set to zensols.model.eval-classifier/two-pass-train-instances
*get-data-fn*
dynamic
A function that generates a weka.core.Instances
for cross validation, training, etc.
*operation-write-instance-fns*
dynamic
A map with valus of functions that are called that return a java.util.File
for an operation represented by the respective key. An ARFF file is created at the file location. The keys are one of:
- :train-classifier called when the classifier is training a model
- :test-classifier called when the classifier is testing a model
*rand-fn*
dynamic
A function that returns a java.util.Random
used to randomize the train/test dataset.
analysis-report-resource
(analysis-report-resource)
Return the model directory on the file system as defined by the :analysis-report
. See namespace documentation on how to configure.
classifier-name
(classifier-name classifier-instance)
Return a decent human readable name of a classifier instance.
classify-instance
(classify-instance classifier unlabeled return-keys)
Make predictions for all instances.
- classifier instance of
weka.classifiers.Classifier
- unlabeled contains feature set data with an empty class label as a
weka.core.Instances
- return-keys what data to return
- :label the classified label
- :distributions the probability distribution over the label
compile-results
(compile-results results)
Return an easier to use map of result data given from cross-validate-tests. The map returns all the performance statistics and:
- :feature-metadata feature metadatas
- :result
weka.core.Evaluation
instance - all-results a sorted list of
weka.core.Evaluation
instances
See cross-validate-tests for where the results data is created.
cross-validate-tests
(cross-validate-tests classifier attributes feature-metadata)
Run the cross validation for classifier and attributes (symbol set).
excel-results
(excel-results sheet-name-results out-file)
Save the results in Excel format.
excel-results-precision
dynamic
An integer specifying the length of the mantissa when creating the results spreadsheet in excel-results.
filter-attribute-data
(filter-attribute-data unfiltered attributes)
Create a filtered data set (weka.core.Instances
) from unfiltered Instances. Paramater attributes is a set of string attribute names.
initialize
(initialize)
Initialize model resource locations.
This needs the system property clj.nlp.parse.model
set to a directory that has the POS tagger model english-left3words-distsim.tagger
(or whatever you configure in zensols.nlparse.stanford/create-context) in a directory called pos
.
See the source documentation for more information.
model-exists?
(model-exists? name)
Return whether a the model exists with name
.
See model-read-resource.
model-read-resource
(model-read-resource name)
Return a file pointing to model with name
using the the :model-read
resource path (see zensols.actioncli.resource/resource-path).
model-write-resource
(model-write-resource name)
Return a file pointing to model with name
using the the :model-write
resource path (see zensols.actioncli.resource/resource-path).
print-eval-results
(print-eval-results eval)
Print the results, confusion matrix and class details to standard out of a weka.core.Evalution
.
print-results
(print-results results & {:keys [title]})
Print the results, confusion matrix and class details to standard out of a single or sequence of weka.core.Evalution
s.
read-model
(read-model name & {:keys [fail-if-not-exists?], :or {fail-if-not-exists? true}})
Get a saved model (classifier and attributes used). If name is a string, use model-read-resource to calculate the file name. Otherwise, it should be a file of where the model exists.
See model-read-resource.
Keys
- :fail-if-not-exists? if
true
then throw an exception if the model file is missing
test-classifier
(test-classifier classifier attributes train-data test-data)
Test/evaluate classifier (weka.classifiers.Classifier
).
train-classifier
(train-classifier classifier attributes)
Train classifier (weka.classifiers.Classifier
).
train-test-classifier
(train-test-classifier classifier feature-meta-sets feature-metadata train-instances test-instances)
write-arff
(write-arff instances)
Write a weka.core.Instances
to an ARFF file and return that file.
write-model
(write-model name model)
Get a saved model (classifier and attributes used). If name is a string, use model-write-resource to calculate the file name. Otherwise, it should be a file of where to write the model.