Basic overview of the concepts of overlap analysis context

An instance of OverlapAnalysisContext class (see apidoc) represents major settings and parameters required for generating and comparating molecular descriptors (fingerprints). Command line tools usually need the context specified explicitly by the user. Usually parameters -context and -contextjs used for this specification. The help printed by the involved command line tools (printed when option -h passed) documents these options briefly.

Underlying APIs

Internally an instance of OverlapAnalysisContext class (see apidoc) is used for calculations. Command line tools use OverlapAnalysisContextFactory (see apidoc) to create predefined instances specified by option -context. JavaScript context customization/creation hooks (to interact directly directly with the Java APIs) can be specified by option -contextjs; they are processed by class ContextJsTools (see apidoc).

Note that class OverlapAnalysisContext is an immutable (see Wikipedia) cumulative factory (See explanation) class: each method invoked will create a new instance of it. On the other hand typically descriptor parameter builders (like CfpParameters.Builder) are builders (see Wikipedia) where method invocations modify the state of the builder itself and the build() method will create the immutable parameter object (like CfParameters).

Using pre-defined contexts

Predefined contexts are referenced by option -context. These available ones are printed when option -h passed to the involved command line tool:

bin/buildStorage.sh -h
....

Applicable context names:

"createSimpleCfp5Context" "createSimpleCfp6Context" "createSimpleCfp7Context" "createSimpleCfp8Context" "createSimpleCfp9Context" "createSimpleCfp10Context" "createSimpleEcfp4Context" "createSimpleEcfp6Context" "createSimpleEcfp8Context" "createSimpleEcfp10Context" "createSimplePharmaCalcContext" "screen3d" "screen3dr" "createSimpleCfp4Context"

....

Using custom JS hooks

A custom JavaScript fragment can be passed to option -contextjs which last statement is expected to specify the context to be used. This script fragment access the Java API and some preinitialized helper variables, also documented by the help printed.

Example JS hook: customize metric

This example shows how to change the represented metric of a specified context. A pre-defined context (specified with option -context is used initially. It is further customized in the script hook.

...
-context "createSimpleCfp7Context" -contextjs "ctx.descriptorComparator(ctx.getDescriptorGenerator().getBinaryMetricsComparator(bm_MANHATTAN))"
...

Breakdown of the contents of the passed JavaScript fragment customizing the OverlapAnalysisContext used:

Script part Description
ctx This reference holds the OverlapAnalysisContext instance specified by option -context. See apidoc.
.descriptorComparator(...) Update metric to be used. See apidoc.
ctx.getDescriptorGenerator() Represented generator; will use its factory methods to create new metric. See apidoc.
.getBinaryMetricsComparator(...) Factory method for non parametrized metrics. See apidoc.
bm_MANHATTAN Constant which can be passed to method .getBinaryMetricComparator(..). See apidoc.

Example JS hook: customize a fingerprint

This example shows how to access Java API to set fingerprint parameters. Note that the fast similarity search tools and this overlap-examples distribution prefer to use the "new" descriptors API (found in package com.chemaxon.descriptors)

....
-contextjs "ctx_from_descpb(bld_cfp.length(2048).bitsPerPattern(4).bondCount(2).rings(false)).standardizer(std_defaultaroma)"
....

Breakdown of the contents of the passed JavaScript fragment creating the OverlapAnalysisContext used:

Script part Description
ctx_from_descpb(..) Helper function which creates a default OverlapAnalysisContext from the associated DescriptorParameters builder.
bld_cfp A builder instance for CfpParameters in default state.
.length(..) Update builder with length parameter (see apidoc).
.bitsPerPattern(..) Update builder with bitsPerPattern parameter (see apidoc).
.bondCount(..) Update builder with bondCount parameter (see apidoc).
.rings(..) Update builder with rings parameter (see apidoc).
.standardizer(..) Update context (created by ctx_from_descpb(..)) by specifying a standardizer. (See apidoc);
std_defaultaroma Helper constant, a StandardizerWrapper instance wrapping default aromatization

Example JS hook: define a custom binary vector descriptor

Custom binary vector descriptors holding externally defined fingerprints currently must be defined using the Java API. This example defines a descriptor expecting 1024 bit length binary bit strings:

....
-contextjs "ctx_from_descpb(bld_bv.length(1024).endianness(en_BIG_ENDIAN).stringFormat(sf_STRICT_BINARY_STRING))"
....

Breakdown of the contents of the passed JavaScript fragment creating the OverlapAnalysisContext used:

Script part Description
ctx_from_descpb(..) Helper function which creates a default OverlapAnalysisContext from the associated DescriptorParameters builder.
bld_bv A builder instance for BvParameters in default state.
.length(..) Update builder with length parameter (see apidoc).
.endianness(..) Update builder with endianness parameter (see apidoc).
en_BIG_ENDIAN Constant which can be passed to .endianness(..) (see apidoc).
.stringFormat(..) Update builder with string format parameter (see apidoc).
sf_STRICT_BINARY_STRING Constant which can be passed to .stringFormat(..) (see apidoc).

Helper function ctx_from_descpb

This helper function definition (as documented by the command line tools help) is the following JavaScript fragment:

ctx_from_descpb = function ctx_from_desc(d) {
    return Packages.com.chemaxon.overlap.OverlapAnalysisContext.initial(d.build().getDescriptorGenerator());
}

Breakdown of the used parts of the Java API:

Script part Apidoc link
Packages.com.chemaxon.overlap.OverlapAnalysisContext.initial apidoc
d Expected to be a builder for a DescriptorParameters

Examples for diagnostic

During the evaluation of scripting hook JavaScript command println(...) can be used for diagnostic. Most classes in the new descriptors API produce meaningful messages in their toString() methods. Some objects implement method toString(boolean multiline) or toMultilineString() which returns more readable String representations.

Note that the execution of most command line tools is verbose; they print the textual representation of the main settings eventually used.

Print already initialized context

cat data/vitamins.smi | bin/buildStorage.sh -in - -out tmp.bin \
    -context createSimpleCfp7Context \
    -contextjs "println('Initialized context:'); println(ctx.toString(true)); ctx"

Note that last statement of the script passed to option -contextjs must be the OverlapAnalysisContext to be used. In this example we just want to print it to the console, but println() would return undefined so as a last statement the already initialized context reference (ctx) is used.

Output:

com.chemaxon.overlap.cli.BuildStorage
    args: [-in, -, -out, tmp.bin, -context, createSimpleCfp7Context, -contextjs, println('Initialized context:'); println(ctx.toString(true)); ctx]

Initialized context:
Overlap analysis context.
    Pagesize:       50
    Standardizer:   ThreadLocalized wrapper over chemaxon.standardizer.Standardizer@3cf05ce2 (actions count: 1)
    Generator:      CFP generator, parameters: bond count: 7 (bits per pattern: 1, length: 1024)
    Comparator:     Comparator BINARY_TANIMOTO, vector size: 1024 bits
    Extractor:      Extract packed long [] fingerprint representation (16 longs, 1024 bits)
    Unguarded calc: Tanimoto dissimilarity of binary fingerprints represented as packed long[]

Context
Overlap analysis context.
    Pagesize:       50
    Standardizer:   ThreadLocalized wrapper over chemaxon.standardizer.Standardizer@3cf05ce2 (actions count: 1)
    Generator:      CFP generator, parameters: bond count: 7 (bits per pattern: 1, length: 1024)
    Comparator:     Comparator BINARY_TANIMOTO, vector size: 1024 bits
    Extractor:      Extract packed long [] fingerprint representation (16 longs, 1024 bits)
    Unguarded calc: Tanimoto dissimilarity of binary fingerprints represented as packed long[]


Reading - time: 491 ms (30 x 16 ms each)
(Finished) Reading - time: 491 ms (30 x 16 ms each)
Error counts collected: Total: 30 OK: 30 Parse error: 0 Process error: 0
Index projector:        Skiplist index projector initialMasterSkips: 0 maxClientIndex: 29 maxMasterIndex: 29 master index skiplist: []
Writing tmp.bin time: 12 ms (1 x 12 ms each) (1 of 30; 3 %)
(Finished) Writing tmp.bin time: 12 ms (30 x 400 us each) (30 of 30; 100 %)
All done.

Look up available metrics of a predefined context

A DescriptorComparator instance represents a metric. Such instances are usually created by various factory methods of the associated DescriptorGenerator instance. Java API documentation describes the available such methods for various descriptors. Script hooks can be useful to look up the type of the associated DescriptorGenerator instances represented by the pre-defined contexts.

cat data/vitamins.smi | bin/buildStorage.sh -in - -out tmp.bin \
    -context createSimpleCfp7Context \
    -contextjs "println('Generator summary: '+ctx.getDescriptorGenerator()); println('Generator immediate type: '+ctx.getDescriptorGenerator().getClass())"

Output:

com.chemaxon.overlap.cli.BuildStorage
    args: [-in, -, -out, tmp.bin, -context, createSimpleCfp7Context, -contextjs, println('Generator summary: '+ctx.getDescriptorGenerator()); println('Generator immediate type: '+ctx.getDescriptorGenerator().getClass())]

Generator summary: CFP generator, parameters: bond count: 7 (bits per pattern: 1, length: 1024)
Generator immediate type: class com.chemaxon.descriptors.fingerprints.cfp.CfpGeneratorImpl
Exception in thread "main" java.lang.IllegalArgumentException: Script returned null: println('Generator summary: '+ctx.getDescriptorGenerator()); println('Generator immediate type: '+ctx.getDescriptorGenerator().getClass())
	at com.chemaxon.overlap.ContextJsTools.evalJs(ContextJsTools.java:186)
	at com.chemaxon.overlap.ContextJsTools.initializeContext(ContextJsTools.java:220)
	at com.chemaxon.overlap.cli.BuildStorage.main(BuildStorage.java:95)

This execution fails since the scripting hook did not returned a valid OverlapAnalysisContext instance, however we have the immediate type of the associated DescriptorGenerator: com.chemaxon.descriptors.fingerprints.cfp.CfpGeneratorImpl. We can look up its apidoc. We can identify the applicable DescriptorComparator factory methods: