|
jklustor-overlap-0.0.2-20140619012940 (ChemAxon) |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Objectcom.chemaxon.overlap.storage.PagedDescriptorStorage<D>
D
- Stored descriptor typepublic final class PagedDescriptorStorage<D extends com.chemaxon.descriptors.common.Descriptor>
Descriptor storage.
Storage is organized into fixed size pages. All pages are full, expect the last one which can be partially filled. Descriptors at pages are indexed sequentially.
Licensing: this class can be used with valid LicenseGlobals.OVERLAP
license.
Field Summary | |
---|---|
static int |
MAX_RESULT_QUEUE_SIZE
Max non reported queue elements. |
Constructor Summary | |
---|---|
PagedDescriptorStorage(int pagesize,
com.chemaxon.descriptors.common.DescriptorGenerator<D> generator)
Construct new empty descriptor storage. |
|
PagedDescriptorStorage(int pagesize,
com.chemaxon.descriptors.common.DescriptorGenerator<D> generator,
InputStream is,
com.chemaxon.calculations.common.SubProgressObserver po)
Construct from a String serialized form. |
|
PagedDescriptorStorage(int pagesize,
com.chemaxon.descriptors.common.DescriptorGenerator<D> generator,
ObjectInputStream ois,
com.chemaxon.calculations.common.SubProgressObserver po)
Construct from a byte [] serialized form. |
Method Summary | ||
---|---|---|
void |
addAll(InputStream is,
String opts,
int skipCount,
int maxProcessCount,
StandardizerWrapper standardizer,
com.chemaxon.calculations.common.SubProgressObserver po,
ExecutorService e,
MoleculeCallback moleculeCallback)
Read all molecules from a structure file into the similarity subsystem. |
|
(package private) void |
addAll(Iterator<StructureRecord> input,
StandardizerWrapper standardizer,
com.chemaxon.calculations.common.ProgressObserver po,
ExecutorService e,
MoleculeCallback moleculeCallback)
Import from an input iterator. |
|
int |
addDescriptor(D d)
Add a single descriptor to the similarity subsystem. |
|
int |
addMolecule(chemaxon.struc.Molecule m)
Add a single molecule to the similarity subsystem. |
|
|
createBruteForceOverlap(Function<D,T> extractor,
UnguardedDissimilarityCalculator<T> comparator)
Create a brute force overlap calculator from the current state of the storage. |
|
(package private) void |
dequeueFinisheds(Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue,
MoleculeCallback moleculeCallback,
com.chemaxon.calculations.common.ProgressObserver po)
Dequeue finished results if any. |
|
static
|
deserializeUnguarded(int pagesize,
com.chemaxon.descriptors.common.DescriptorGenerator<D> generator,
Function<D,T> extractor,
UnguardedDissimilarityCalculator<T> comparator,
ObjectInputStream ois,
com.chemaxon.calculations.common.SubProgressObserver po)
Deserialize an UnguardedPagedSimilarity from a binary serialized form. |
|
(package private) void |
enqueueNextBatch(Iterator<StructureRecord> input,
StandardizerWrapper standardizer,
Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue,
ExecutorService e)
Enqueue next batch of processes. |
|
(package private) List<D> |
getPage(int pageno)
|
|
(package private) int |
getPageCount()
Count of pages. |
|
(package private) Iterator<List<D>> |
iteratePages()
|
|
int |
size()
Stored descriptor count. |
|
void |
toBytes(ObjectOutputStream os,
com.chemaxon.calculations.common.SubProgressObserver po)
Deprecated. Use toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver, long) with a
sound reset interval. |
|
void |
toBytes(ObjectOutputStream os,
com.chemaxon.calculations.common.SubProgressObserver po,
long resetInterval)
Dump descriptors to a binary file. |
|
void |
toStrings(PrintStream ps,
com.chemaxon.calculations.common.SubProgressObserver po)
Write String representations to a PrintStream . |
|
void |
toStrings(PrintStream ps,
com.chemaxon.calculations.common.SubProgressObserver po,
ExecutorService e)
Write String representations to a PrintStream using concurrent conversions. |
Methods inherited from class Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int MAX_RESULT_QUEUE_SIZE
This is the max number of enqueued Future
references waiting to final storage/error reporting.
Constructor Detail |
---|
public PagedDescriptorStorage(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator)
Note that to acquire guard object reference, an empty molecule is generated in the constructor.
pagesize
- Size of each page (molecules/descriptors)generator
- Represented descriptor generator
chemaxon.license.LicenseException
- when appropriate license is not availablepublic PagedDescriptorStorage(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator, ObjectInputStream ois, com.chemaxon.calculations.common.SubProgressObserver po) throws IOException, ClassNotFoundException
byte []
serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serialization. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorGenerator.fromByteArray(byte[])
.
Note that to acquire guard object reference, an empty molecule is generated in the constructor.
Compatible serialized form is generated by
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver)
. Note that
serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!
pagesize
- Size of each page (molecules/descriptors)generator
- Represented descriptor generatorois
- ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish
or abortpo
- ProgressObserver to track progress. Note that SubProgressObserver.done()
is invoked upon
completion
IOException
- re-thrown from passed ObjectInputStream
ClassNotFoundException
- re-thrown from passed ObjectInputStream
IllegalArgumentException
- upon error reading
CancellationException
- upon cancellation from progress observer
chemaxon.license.LicenseException
- when appropriate license is not availablepublic PagedDescriptorStorage(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator, InputStream is, com.chemaxon.calculations.common.SubProgressObserver po)
String
serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serializetion. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorGenerator.fromString(java.lang.String)
.
Note that to acquire guard object reference, an empty molecule is generated in the constructor.
pagesize
- Size of each page (molecules/descriptors)generator
- Represented descriptor generatoris
- InputStream to read descriptors line by linepo
- ProgressObserver to track progress. SubProgressObserver.done()
is invoked upon
completion
IllegalArgumentException
- upon error reading
CancellationException
- upon cancellation from progress observer
chemaxon.license.LicenseException
- when appropriate license is not availableMethod Detail |
---|
public static <D extends com.chemaxon.descriptors.common.Descriptor,T extends Serializable> UnguardedPagedOverlap<T> deserializeUnguarded(int pagesize, com.chemaxon.descriptors.common.DescriptorGenerator<D> generator, Function<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator, ObjectInputStream ois, com.chemaxon.calculations.common.SubProgressObserver po) throws IOException, ClassNotFoundException
UnguardedPagedSimilarity
from a binary serialized form.
Note that the supplied DescriptorGenerator
must be parametrized the same way as the one used for
String serialization. Compatibility of generators is not checked, however in some but not all cases
incompatibility results in a RuntimeException
thrown by the used
DescriptorGenerator.fromByteArray(byte[])
.
Compatible serialized form is generated by
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver)
. Note that
serialized form is not necessarily compatible between different versions (including underlying Marvin/JChem)!
D
- Generated descriptor typeT
- Unguarded form of the descriptorspagesize
- Size of each pagegenerator
- Generator to be used for deserializationextractor
- Function
to extract unguarded descriptor content for storagecomparator
- Unguarded comparator to be represented by the constructed instanceois
- ObjectInputStream to read descriptors byte form. Note that this stream is not closed upon finish
or abortpo
- ProgressObserver to track progress. Note that SubProgressObserver.done()
is invoked upon
completion
IOException
- re-thrown from passed ObjectInputStream
ClassNotFoundException
- re-thrown from passed ObjectInputStream
IllegalArgumentException
- upon error reading
CancellationException
- upon cancellation from progress observerpublic void toStrings(PrintStream ps, com.chemaxon.calculations.common.SubProgressObserver po) throws CancellationException
PrintStream
.
Any error from the underlying DescriptorGenerator.toString(com.chemaxon.descriptors.common.Descriptor)
will propagate from this method and the execution will be aborted.
ps
- PrintStream to write progress. Note that ps will not be closed upon finish.po
- Observer to follow progress. Observer is switched to determinate state with each descriptor
representing a work unit. Done will be reported upon completion/cancellation.
CancellationException
- upon cancellation@Deprecated public void toBytes(ObjectOutputStream os, com.chemaxon.calculations.common.SubProgressObserver po) throws IOException
toBytes(java.io.ObjectOutputStream, com.chemaxon.calculations.common.SubProgressObserver, long)
with a
sound reset interval.
os
- Object output stream to write. Stream is not closed upon completion.po
- ProgressObserver to track progress. Observer is closed by invoking SubProgressObserver.done()
upon completion, failure or cancellation
CancellationException
- when cancelled through the given observer
IOException
- thrown from passed ObjectOutputStream
public void toBytes(ObjectOutputStream os, com.chemaxon.calculations.common.SubProgressObserver po, long resetInterval) throws IOException
Warning! This method usually resets the given ObjectOutputStream
by calling its
ObjectOutputStream.reset()
method periodically.
This method differs from serialization: only the descriptors are written, the associated descriptor generator is not. Also, page size is not retained, so it is possible to read descriptors back to different page sizes.
It is important that the underlying DescriptorGenerator
instance must be reconstructed upon
deserialization. This method currently does not write descriptor generator related information, but this behavior
might change in the future.
Export format in the current version:
ObjectOutputStream.writeInt(int)
invoked with the total descriptor count as the parameterObjectOutputStream.writeUnshared(java.lang.Object)
invoked for each descriptor, byte []
representation of each descriptor is passed as the parameter
(created by
DescriptorGenerator.toByteArray(com.chemaxon.descriptors.common.Descriptor)
).ObjectOutputStream.reset()
is invoked to avoid memory leak in serialization
os
- Object output stream to write. Stream is not closed upon completion.po
- ProgressObserver to track progress. Observer is closed by invoking SubProgressObserver.done()
upon completion, failure or cancellationresetInterval
- Reset stream by invoking ObjectOutputStream.reset()
periodically after given
descriptors written. Value must be greater than zero.
todo: consider optimal value for resetInterval.
CancellationException
- when cancelled through the given observer
IOException
- thrown from passed ObjectOutputStream
public void toStrings(PrintStream ps, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService e) throws CancellationException
PrintStream
using concurrent conversions.
Callback (po) and stream access is made on the calling thread. This method blocks until completion or abortion due to an underlying exception.
ps
- PrintStream to write progress. Note that ps will not be closed upon finish.po
- Observer to follow progress. Observer is switched to determinate state with each descriptor
representing a work unit. Done will be reported upon completion/cancellation.e
- Executor service to use for string serialization
CancellationException
- upon cancellationpublic int size()
int getPageCount()
List<D> getPage(int pageno)
Iterator<List<D>> iteratePages()
public void addAll(InputStream is, String opts, int skipCount, int maxProcessCount, StandardizerWrapper standardizer, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService e, MoleculeCallback moleculeCallback)
Updater
Consecutive members of a structure file have consecutive indexes associated. Usually first molecule in the file have index value 0 associated. To allow segmented reading, this method can be called multiple times to append additional structures.
Consistency considerations: the storage is left in a consistent state in case of the following abnormal or unexpected terminations:
SubProgressObserver
Notes on multithreading:
addAll
in interface Updater<D extends com.chemaxon.descriptors.common.Descriptor>
is
- Input stream to read from. Note that the stream is not closed when returning.opts
- Input options or null
to pass to underlying
MFileFormatUtil.createRecordReader(java.io.InputStream, java.lang.String)
skipCount
- Skip given number of structures. Skipped structures are also reported to the
given progress observer like ordinary processed structures, however they wont
generate calls into the supplied MoleculeCallback
.maxProcessCount
- Read at most given number of structures. Count starts after skipping structures.standardizer
- Standardizer to apply on molecules. See StandardizerWrappers
for utility
methods. Note that supplied wrapper must be thread safe.po
- ProgressObserver to track file read. Total reported work units are assigned to read
and processed/skipped molecules count. The given observer is closed
upon returninge
- ExecutorService to run descriptor generation for pagesmoleculeCallback
- Callback to report back assigned indexes/processing errors.void enqueueNextBatch(Iterator<StructureRecord> input, StandardizerWrapper standardizer, Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue, ExecutorService e)
Exceptions from input are propagated.
input
- Source of inputsstandardizer
- Standardizer to useresultsQueue
- Queue for resultse
- Executor service to usevoid dequeueFinisheds(Queue<Future<List<ProcessQueueItem<D>>>> resultsQueue, MoleculeCallback moleculeCallback, com.chemaxon.calculations.common.ProgressObserver po)
Wait if results queue is full.
resultsQueue
- List of results to checkmoleculeCallback
- Callback to notify event detailspo
- ProgressObserver to report processed (possibly failed) input molecule count
CancellationException
- upon cancellation from progressObservervoid addAll(Iterator<StructureRecord> input, StandardizerWrapper standardizer, com.chemaxon.calculations.common.ProgressObserver po, ExecutorService e, MoleculeCallback moleculeCallback)
input
- Iterator to input from. Will not be closed upon error/completion.standardizer
- Standardizer to use if requiredp
- see Updater
po
- Progress observer to track progress; wont be closed upon completione
- see Updater
moleculeCallback
- see Updater
public int addMolecule(chemaxon.struc.Molecule m)
Updater
Note that the given molecule must be standardized before calling this method.
addMolecule
in interface Updater<D extends com.chemaxon.descriptors.common.Descriptor>
m
- Molecule to be added
public int addDescriptor(D d)
Updater
Note that descriptors have a compatibility related API contract (currently references returned by
Descriptor.getDescriptorGenerator()
must be equal for compatible descriptors) which must be satisfied by
the passed descriptor.
addDescriptor
in interface Updater<D extends com.chemaxon.descriptors.common.Descriptor>
d
- Descriptor to be added
public <T extends Serializable> UnguardedPagedOverlap<T> createBruteForceOverlap(Function<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator)
The supplied function is applied to all represented descriptors and the resulting bare forms are stored in the returned instance.
T
- Type of unguarded formextractor
- Unguarded form extractor function to usecomparator
- Unguarded dissimilarity calculator to use on extracted unguarded form
|
jklustor-overlap-0.0.2-20140619012940 (ChemAxon) |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |