The tautomerization models behind the JChem tautomer search

The JChem tautomer search decides if a query and target molecules are tautomers of each other. It can use two tautomerization models for this: the generic and the normal canonical tautomerization.

To decide the tautomer equivalence, the search algorithm first generates the relevant tautomer forms of query and target, then makes a graph equivalence check for the generated tautomers. If the two generated tautomer forms are identical, the search considers the query and the target as tautomers.

The following description gives an overview on the generic and normal canonical tautomerization.

Generic Tautomer

The generic tautomer represents all theoretically possible tautomer forms of the input molecule. It is generated based on the following algorithm:

img

img

img

img

The output of this generation process is the generic tautomer form of the input molecule showing the identified distinct tautomer regions.

Normal Canonical Tautomer

The normal canonical form (compared to the generic) represents a subset of all possible tautomers of input structure.

The normal canonical forms are generated based on the following algorithm:

The output of this generation process is the normal canonical form of the molecule.

Examples

The following examples show how the generic and normal canonical tautomerization behave in the cases of the 5 most common tautomerization types.

Oxo-enol tautomerization

Molecules Generic tautomers Normal canonical tautomers
img img img
img img img
img img img
img img img

Amine-imine tautomerization

Molecules Generic tautomers Normal canonical tautomers
img img img
img img img
img img img
img img img

Amide-imide tautomerization

Molecules Generic tautomers Normal canonical tautomers
img img img
img img img

Lactame-lactime tautomerization

Molecules Generic tautomers Normal canonical tautomers
img img img

Nitroso-oxime tautomerization

In the case of the nitroso-oxim tautomerization the generated generic tautomer forms are the same, while the normal canonical tautomers are different. This shows that both forms are stable and exist.

Molecules Generic tautomers Normal canonical tautomers
img img img

Difference between the two models

The following examples show molecule pairs for which the generic forms are identical, while the normal canonical forms are different. This shows that the generic tautomerization model considers the two forms as tautomer forms. The normal canonical model does not, which means that the two molecules can be considered as distinct molecules.

Molecules Generic tautomers Normal canonical tautomers
img img img
img img img
img img img
img img img
img img img
img img img
img img img

Speed

The generic tautomer generation was measures to be 5x faster than the normal canonical generation. These minor speed tests were run on a MacBook Pro (2.7 GHz Intel Core i5, 8GB DDR3).

$ time cxcalc -N ih generictautomer nci_rnd_1000.smiles >nci_rnd_1000_generic.smiles 

real    0m5.225s
user    0m12.194s
sys 0m0.573s
$ time cxcalc -N ih canonicaltautomer --normal nci_rnd_1000.smiles >nci_rnd_1000_n_canonical.smiles 

real    0m25.303s
user    1m9.342s
sys 0m1.683s