jklustor-overlap-0.0.2-20140619012940 (ChemAxon)


com.chemaxon.overlap.storage
Class PagedDescriptorStorage<D extends com.chemaxon.descriptors.common.Descriptor>

Object
  extended by com.chemaxon.overlap.storage.PagedDescriptorStorage<D>
Type Parameters:
D - Stored descriptor type
All Implemented Interfaces:
Updater<D>

public final class PagedDescriptorStorage<D extends com.chemaxon.descriptors.common.Descriptor>
extends Object
implements Updater<D>

Descriptor storage.

Storage is organized into fixed size pages. All pages are full, expect the last one which can be partially filled. Descriptors at pages are indexed sequentially.

Licensing: this class can be used with valid LicenseGlobals.OVERLAP license.

Author:
Gabor Imre

Field Summary
static int MAX_RESULT_QUEUE_SIZE
          Max non reported queue elements.
 
Constructor Summary
PagedDescriptorStorage(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator)
          Construct new empty descriptor storage.
PagedDescriptorStorage(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator, InputStream is, com.chemaxon.calculations.common.SubProgressObserver po)
          Construct from a String serialized form.
PagedDescriptorStorage(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator, ObjectInputStream ois, com.chemaxon.calculations.common.SubProgressObserver po)
          Construct from a byte [] serialized form.
 
Method Summary
 void addAll(InputStream is, String opts, int skipCount, int maxProcessCount, StandardizerWrapper standardizer, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService e, MoleculeCallback moleculeCallback)
          Read all molecules from a structure file into the similarity subsystem.
(package private)  void addAll(Iterator<StructureRecord> input, StandardizerWrapper standardizer, com.chemaxon.calculations.common.ProgressObserver po, ExecutorService e, MoleculeCallback moleculeCallback)
          Import from an input iterator.
 int addDescriptor(D d)
          Add a single descriptor to the similarity subsystem.
 int addMolecule(chemaxon.struc.Molecule m)
          Add a single molecule to the similarity subsystem.
<T extends Serializable>
UnguardedPagedOverlap<T>
createBruteForceOverlap(Function<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator)
          Create a brute force overlap calculator from the current state of the storage.
(package private)  void dequeueFinisheds(Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue, MoleculeCallback moleculeCallback, com.chemaxon.calculations.common.ProgressObserver po)
          Dequeue finished results if any.
static
<D extends com.chemaxon.descriptors.common.Descriptor,T extends Serializable>
UnguardedPagedOverlap<T>
deserializeUnguarded(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator, Function<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator, ObjectInputStream ois, com.chemaxon.calculations.common.SubProgressObserver po)
          Deserialize an UnguardedPagedSimilarity from a binary serialized form.
(package private)  void enqueueNextBatch(Iterator<StructureRecord> input, StandardizerWrapper standardizer, Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue, ExecutorService e)
          Enqueue next batch of processes.
(package private)  List<D> getPage(int pageno)
           
(package private)  int getPageCount()
          Count of pages.
(package private)  Iterator<List<D>> iteratePages()
           
 int size()
          Stored descriptor count.
 void toBytes(ObjectOutputStream os, com.chemaxon.calculations.common.SubProgressObserver po)
          Deprecated. Use toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver, long) with a sound reset interval.
 void toBytes(ObjectOutputStream os, com.chemaxon.calculations.common.SubProgressObserver po, long resetInterval)
          Dump descriptors to a binary file.
 void toStrings(PrintStream ps, com.chemaxon.calculations.common.SubProgressObserver po)
          Write String representations to a PrintStream.
 void toStrings(PrintStream ps, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService e)
          Write String representations to a PrintStream using concurrent conversions.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_RESULT_QUEUE_SIZE

public static final int MAX_RESULT_QUEUE_SIZE
Max non reported queue elements.

This is the max number of enqueued Future references waiting to final storage/error reporting.

See Also:
Constant Field Values
Constructor Detail

PagedDescriptorStorage

public PagedDescriptorStorage(int pagesize,
                              com.chemaxon.descriptors.common.DescriptorGenerator<D> generator)
Construct new empty descriptor storage.

Note that to acquire guard object reference, an empty molecule is generated in the constructor.

Parameters:
pagesize - Size of each page (molecules/descriptors)
generator - Represented descriptor generator
Throws:
chemaxon.license.LicenseException - when appropriate license is not available

PagedDescriptorStorage

public PagedDescriptorStorage(int pagesize,
                              com.chemaxon.descriptors.common.DescriptorGenerator<D> generator,
                              ObjectInputStream ois,
                              com.chemaxon.calculations.common.SubProgressObserver po)
                       throws IOException,
                              ClassNotFoundException
Construct from a byte [] serialized form.

Note that the supplied DescriptorGenerator must be parametrized the same way as the one used for String serialization. Compatibility of generators is not checked, however in some but not all cases incompatibility results in a RuntimeException thrown by the used DescriptorGenerator.fromByteArray(byte[]).

Note that to acquire guard object reference, an empty molecule is generated in the constructor.

Compatible serialized form is generated by toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver). Note that serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!

Parameters:
pagesize - Size of each page (molecules/descriptors)
generator - Represented descriptor generator
ois - ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish or abort
po - ProgressObserver to track progress. Note that SubProgressObserver.done() is invoked upon completion
Throws:
IOException - re-thrown from passed ObjectInputStream
ClassNotFoundException - re-thrown from passed ObjectInputStream
IllegalArgumentException - upon error reading
CancellationException - upon cancellation from progress observer
chemaxon.license.LicenseException - when appropriate license is not available

PagedDescriptorStorage

public PagedDescriptorStorage(int pagesize,
                              com.chemaxon.descriptors.common.DescriptorGenerator<D> generator,
                              InputStream is,
                              com.chemaxon.calculations.common.SubProgressObserver po)
Construct from a String serialized form.

Note that the supplied DescriptorGenerator must be parametrized the same way as the one used for String serializetion. Compatibility of generators is not checked, however in some but not all cases incompatibility results in a RuntimeException thrown by the used DescriptorGenerator.fromString(java.lang.String).

Note that to acquire guard object reference, an empty molecule is generated in the constructor.

Parameters:
pagesize - Size of each page (molecules/descriptors)
generator - Represented descriptor generator
is - InputStream to read descriptors line by line
po - ProgressObserver to track progress. SubProgressObserver.done() is invoked upon completion
Throws:
IllegalArgumentException - upon error reading
CancellationException - upon cancellation from progress observer
chemaxon.license.LicenseException - when appropriate license is not available
Method Detail

deserializeUnguarded

public static <D extends com.chemaxon.descriptors.common.Descriptor,T extends Serializable> UnguardedPagedOverlap<T> deserializeUnguarded(int pagesize,
                                                                                                                                          com.chemaxon.descriptors.common.DescriptorGenerator<D> generator,
                                                                                                                                          Function<D,T> extractor,
                                                                                                                                          UnguardedDissimilarityCalculator<T> comparator,
                                                                                                                                          ObjectInputStream ois,
                                                                                                                                          com.chemaxon.calculations.common.SubProgressObserver po)
                                                                          throws IOException,
                                                                                 ClassNotFoundException
Deserialize an UnguardedPagedSimilarity from a binary serialized form.

Note that the supplied DescriptorGenerator must be parametrized the same way as the one used for String serialization. Compatibility of generators is not checked, however in some but not all cases incompatibility results in a RuntimeException thrown by the used DescriptorGenerator.fromByteArray(byte[]).

Compatible serialized form is generated by toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver). Note that serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!

Type Parameters:
D - Generated descriptor type
T - Unguarded form of the descriptors
Parameters:
pagesize - Size of each page
generator - Generator to be used for deserialization
extractor - Function to extract unguarded descriptor content for storage
comparator - Unguarded comparator to be represented by the constructed instance
ois - ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish or abort
po - ProgressObserver to track progress. Note that SubProgressObserver.done() is invoked upon completion
Returns:
Deserialized similarity search engine
Throws:
IOException - re-thrown from passed ObjectInputStream
ClassNotFoundException - re-thrown from passed ObjectInputStream
IllegalArgumentException - upon error reading
CancellationException - upon cancellation from progress observer

toStrings

public void toStrings(PrintStream ps,
                      com.chemaxon.calculations.common.SubProgressObserver po)
               throws CancellationException
Write String representations to a PrintStream.

Any error from the underlying DescriptorGenerator.toString(com.chemaxon.descriptors.common.Descriptor) will propagate from this method and the execution will be aborted.

Parameters:
ps - PrintStream to write progress. Note that ps will not be closed upon finish.
po - Observer to follow progress. Observer is switched to determinate state with each descriptor representing a work unit. Done will be reported upon completion/cancellation.
Throws:
CancellationException - upon cancellation

toBytes

@Deprecated
public void toBytes(ObjectOutputStream os,
                               com.chemaxon.calculations.common.SubProgressObserver po)
             throws IOException
Deprecated. Use toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver, long) with a sound reset interval.

Dump descriptors to a binary file with no stream reset.

Parameters:
os - Object output stream to write. Stream is not closed upon completion.
po - ProgressObserver to track progress. Observer is closed by invoking SubProgressObserver.done() upon completion, failure or cancellation
Throws:
CancellationException - when cancelled through the given observer
IOException - thrown from passed ObjectOutputStream

toBytes

public void toBytes(ObjectOutputStream os,
                    com.chemaxon.calculations.common.SubProgressObserver po,
                    long resetInterval)
             throws IOException
Dump descriptors to a binary file.

Warning! This method usually resets the given ObjectOutputStream by calling its ObjectOutputStream.reset() method periodically.

This method differs from serialization: only the descriptors are written, the associated descriptor generator is not. Also, page size is not retained, so it is possible to read descriptors back to different page sizes.

It is important that the underlying DescriptorGenerator instance must be reconstructed upon deserialization. This method currently does not write descriptor generator related information, but this behavior might change in the future.

Export format in the current version:

Parameters:
os - Object output stream to write. Stream is not closed upon completion.
po - ProgressObserver to track progress. Observer is closed by invoking SubProgressObserver.done() upon completion, failure or cancellation
resetInterval - Reset stream by invoking ObjectOutputStream.reset() periodically after given descriptors written. Value must be greater than zero. todo: consider optimal value for resetInterval.
Throws:
CancellationException - when cancelled through the given observer
IOException - thrown from passed ObjectOutputStream
See Also:
http://wordpress.nejaa-den.com/outofmemoryexception-memory-leak-in-the-java-class-objectoutputstream-and-objectinputstream/, http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6525563, http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4363937

toStrings

public void toStrings(PrintStream ps,
                      com.chemaxon.calculations.common.SubProgressObserver po,
                      ExecutorService e)
               throws CancellationException
Write String representations to a PrintStream using concurrent conversions.

Callback (po) and stream access is made on the calling thread. This method blocks until completion or abortion due to an underlying exception.

Parameters:
ps - PrintStream to write progress. Note that ps will not be closed upon finish.
po - Observer to follow progress. Observer is switched to determinate state with each descriptor representing a work unit. Done will be reported upon completion/cancellation.
e - Executor service to use for string serialization
Throws:
CancellationException - upon cancellation

size

public int size()
Stored descriptor count.

Returns:
stored descriptor count

getPageCount

int getPageCount()
Count of pages.

Returns:
Count of pages

getPage

List<D> getPage(int pageno)

iteratePages

Iterator<List<D>> iteratePages()

addAll

public void addAll(InputStream is,
                   String opts,
                   int skipCount,
                   int maxProcessCount,
                   StandardizerWrapper standardizer,
                   com.chemaxon.calculations.common.SubProgressObserver po,
                   ExecutorService e,
                   MoleculeCallback moleculeCallback)
Description copied from interface: Updater
Read all molecules from a structure file into the similarity subsystem.

Consecutive members of a structure file have consecutive indexes associated. Usually first molecule in the file have index value 0 associated. To allow segmented reading, this method can be called multiple times to append additional structures.

Consistency considerations: the storage is left in a consistent state in case of the following abnormal or unexpected terminations:

Notes on multithreading:

TODO: shorten parameters list using a builder

Specified by:
addAll in interface Updater<D extends com.chemaxon.descriptors.common.Descriptor>
Parameters:
is - Input stream to read from. Note that the stream is not closed when returning.
opts - Input options or null to pass to underlying MFileFormatUtil.createRecordReader(java.io.InputStream, java.lang.String)
skipCount - Skip given number of structures. Skipped structures are also reported to the given progress observer like ordinary processed structures, however they wont generate calls into the supplied MoleculeCallback.
maxProcessCount - Read at most given number of structures. Count starts after skipping structures.
standardizer - Standardizer to apply on molecules. See StandardizerWrappers for utility methods. Note that supplied wrapper must be thread safe.
po - ProgressObserver to track file read. Total reported work units are assigned to read and processed/skipped molecules count. The given observer

is closed

upon returning
e - ExecutorService to run descriptor generation for pages
moleculeCallback - Callback to report back assigned indexes/processing errors.

enqueueNextBatch

void enqueueNextBatch(Iterator<StructureRecord> input,
                      StandardizerWrapper standardizer,
                      Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue,
                      ExecutorService e)
Enqueue next batch of processes.

Exceptions from input are propagated.

Parameters:
input - Source of inputs
standardizer - Standardizer to use
resultsQueue - Queue for results
e - Executor service to use

dequeueFinisheds

void dequeueFinisheds(Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue,
                      MoleculeCallback moleculeCallback,
                      com.chemaxon.calculations.common.ProgressObserver po)
Dequeue finished results if any.

Wait if results queue is full.

Parameters:
resultsQueue - List of results to check
moleculeCallback - Callback to notify event details
po - ProgressObserver to report processed (possibly failed) input molecule count
Throws:
CancellationException - upon cancellation from progressObserver

addAll

void addAll(Iterator<StructureRecord> input,
            StandardizerWrapper standardizer,
            com.chemaxon.calculations.common.ProgressObserver po,
            ExecutorService e,
            MoleculeCallback moleculeCallback)
Import from an input iterator.

Parameters:
input - Iterator to input from. Will not be closed upon error/completion.
standardizer - Standardizer to use if required
p - see Updater
po - Progress observer to track progress; wont be closed upon completion
e - see Updater
moleculeCallback - see Updater

addMolecule

public int addMolecule(chemaxon.struc.Molecule m)
Description copied from interface: Updater
Add a single molecule to the similarity subsystem.

Note that the given molecule must be standardized before calling this method.

Specified by:
addMolecule in interface Updater<D extends com.chemaxon.descriptors.common.Descriptor>
Parameters:
m - Molecule to be added
Returns:
Associated index of the structure

addDescriptor

public int addDescriptor(D d)
Description copied from interface: Updater
Add a single descriptor to the similarity subsystem.

Note that descriptors have a compatibility related API contract (currently references returned by Descriptor.getDescriptorGenerator() must be equal for compatible descriptors) which must be satisfied by the passed descriptor.

Specified by:
addDescriptor in interface Updater<D extends com.chemaxon.descriptors.common.Descriptor>
Parameters:
d - Descriptor to be added
Returns:
Associated index of the represented structure

createBruteForceOverlap

public <T extends Serializable> UnguardedPagedOverlap<T> createBruteForceOverlap(Function<D,T> extractor,
                                                                                 UnguardedDissimilarityCalculator<T> comparator)
Create a brute force overlap calculator from the current state of the storage.

The supplied function is applied to all represented descriptors and the resulting bare forms are stored in the returned instance.

Type Parameters:
T - Type of unguarded form
Parameters:
extractor - Unguarded form extractor function to use
comparator - Unguarded dissimilarity calculator to use on extracted unguarded form
Returns:
Overlap calculator

jklustor-overlap-0.0.2-20140619012940 (ChemAxon)