Self contained examples

Directory examples/ contains example scripts. These scripts implement workflow specific self contained examples and serve as a starting point for evaluating the functionality provided. The examples use a working directory to store downloaded/generated files. The working directory is usually a subdirectory of the distributions examples-tmp/ directory.

Most of these scripts can be launched after unpacking the distribution and installing the supplied license file without further arguments. Some of them require to download publicly available datasets (see details below). Note that the behavior of these scripts can be customized; see details below. See document Getting started guide for an overview of some of these examples.

Please note that a possible race condition in file IO might result in an Exception (Exception in thread "main" java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Empty stream cannot be read.) when reading files/standard input.

Command line processing examples

These examples wont launch the embedded server/web based UI. They quit after processing and searching is done.

REST API/Web UI examples

These examples launch the embedded web server contained in tool gui.sh. They calculate descriptors and launch an embedded server which provides REST API and Web UI for real time searches. Scripts exposing larger datasets collect profiling and execution statistics and expose them on the web UI. The following scripts use only the shipped datasets. They can be launched without downloading any further public datasets:

Script Memory used Preparation time Load time Molecule count Descriptor count Molecule sets Descriptors
rest-api-example.sh java default < 20 s 1 s 30 51 vitamins, N/A (custom descriptors with no attached molecules) CFP-7, custom binary and floats
rest-api-small.sh 0.5 G < 1 min 1 s 249 k 249 k nci-250k CFP-7
rest-api-medium.sh 1 G 5 min 4 s 1.9 M 3.8 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21 CFP-7, ECFP-4
rest-api-medium-maccs.sh 2 G 20 min 4 s 1.9 M 5.8 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21 CFP-7, ECFP-4, MACCS-166

The following scripts exercise overlap analysis calculation (see Introduction to overlap analysis):

Script Memory used Preparation time
rest-api-example.sh java default < 20 s
overlap-example.sh java default 40 min

The following scripts expose one or more publicly available datasets which are not included in the distribution. They must be downloaded prior to execution, either manually (as described in document Prepare example molecule sets) or by using script examples/download-molecules.sh. Column Sets to download lists options needed to pass to this script.

Script Memory used Preparation time Load time Sets to download Molecule count Descriptor count Molecule sets Descriptors
rest-api-large.sh 8 G 11 min 38 s -E ~ 19 M ~ 19 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus CFP-7
rest-api-large-ecfp.sh 10 G 25 min 28 s -E ~ 19 M ~ 38 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus CFP-7, ECFP-4
rest-api-large-ecfp-maccs.sh 12 G 2 h 30 min 40 s -E ~ 19 M ~ 58 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus CFP-7, ECFP-4, MACCS-166
rest-api-xlarge.sh 20 G 40 min 22 s -E -S ~ 26 M ~ 26 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus, surechembl CFP-7
rest-api-xlarge-ecfp.sh 20 G 1 h 00 min 32 s -E -S ~ 26 M ~ 52 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus, surechembl CFP-7, ECFP-4
rest-api-xlarge-ecfp-maccs.sh 20 G 5 h 00 min 51 s -E -S ~ 26 M ~ 78 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus, surechembl CFP-7, ECFP-4, MACCS-166
rest-api-xxlarge.sh 28 G 57 min 57 s -E -S -Z ~ 54 M ~ 54 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus, surechembl, zinc-all CFP-7
rest-api-xxlarge-ecfp.sh 28 G 1 h 26 min 62 s -E -S -Z ~ 54 M ~ 108 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus, surechembl, zinc-all CFP-7, ECFP-4
rest-api-xxlarge-ecfp-maccs.sh 28 G 6 h 56 min 239 s -E -S -Z ~ 54 M ~ 162 M vitamins, drugbank-all, pubchem-rnd-100k, nci-250k, chembl_21, emolecules-plus, surechembl, zinc-all CFP-7, ECFP-4, MACCS-166

Notes:

Other scripts

Notes on non-workflow specific content of the scripts

The scripts usually provide the following common functionalities.