zensols.dataset.thaw

Exactly like zensols.dataset.db but use the file system.

Instead of using ElasticSearch, use a rows of a JSON file created with zensols.dataset.db/freeze-dataset. The file can be created by any program since it’s just a text file with the following keys:

  • :instance: the (i.e. parsed) data instance (see zensols.dataset.db)
  • :class-label: label of the class for the data instance
  • :id: the string unique ID of the instance
  • :set-type: either train or test depending on the set type.

default-connection-inst

ids

(ids & {:keys [set-type], :or {set-type :train-test}})

Return all IDs based on the dataset split (see class docs).

Keys

  • :set-type is either :train, :test, :train-test (all) and defaults to set-default-set-type or :train if not set

instance-by-id

(instance-by-id conn id)(instance-by-id id)

Get a specific instance by its ID.

This returns a map that has the following keys:

instances

(instances & {:keys [set-type id-set], :or {set-type :train-test}})

Return all instance data based on the dataset split (see class docs).

See instance-by-id for the data in each map sequence returned.

Keys

  • :set-type is either :train, :test, :train-test (all) and defaults to set-default-set-type or :train if not set
  • :include-ids? if non-nil return keys in the map as well

instances-count

(instances-count)

Return the number of datasets in the DB.

set-default-connection

(set-default-connection)(set-default-connection conn)

Set the default connection.

Parameter conn is used in place of what is set with with-connection. This is very convenient and saves typing, but will get clobbered if a with-connection is used further down in the stack frame.

If the parameter is missing, it’s unset.

thaw-connection

(thaw-connection name resource & {:keys [set-type-key], :or {set-type-key :set-type}})

Create a connection with name analogous to zensols.dataset.db/elasticsearch-connection but read from resource (any type usable by clojure.java.io/reader as the backing store. The results are cached in memory.

with-connection

macro

(with-connection connection & body)

Execute a body with the form (with-connection connection …)