jklustor-overlap-0.0.2-20140619012940 (ChemAxon)


com.chemaxon.overlap.bruteforce
Class UnguardedPagedOverlap<T extends Serializable>

Object
  extended by com.chemaxon.overlap.bruteforce.UnguardedPagedOverlap<T>
Type Parameters:
T - Unguarded form of the descriptors
All Implemented Interfaces:
Serializable

public final class UnguardedPagedOverlap<T extends Serializable>
extends Object
implements Serializable

Brute force paged similarity search.

This class is an immutable similarity query engine implementation. The query works on unguarded descriptors for storage and computational efficiency.

Licensing: this class can be used with valid LicenseGlobals.OVERLAP license.

Author:
Gabor Imre
See Also:
Serialized Form

Nested Class Summary
(package private) static class UnguardedPagedOverlap.SearchMostSimilars<T>
          Search multiple pages for most n-most similar structures.
(package private) static class UnguardedPagedOverlap.SearchPage<T>
          Search one page of descriptors against a set of queries.
 
Field Summary
static int PAGES_GROUP_SIZE_FOR_SINGLE_QUERY
          Number of pages to group when processing single query.
 
Constructor Summary
UnguardedPagedOverlap(Function<D,T> extractor, UnguardedDissimilarityCalculator<T> comparator, List<ImmutableList<D>> pages, List<D> page)
          Construct new immutable reference from pages of Descriptor instances.
UnguardedPagedOverlap(UnguardedDissimilarityCalculator<T> comparator, ImmutableList<ImmutableList<T>> pages, int size)
          Construct new immutable reference from prepared storage.
 
Method Summary
 ImmutableList<double[]> calculateFullMatrix(List<T> queries, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService ex, boolean managed)
          Calculate full dissimilarity matrix.
 KnnResults calculateSelfKnn(int k, int queriesGroup, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService ex)
          Brute force find kNN among the contents of the represented descriptors.
 ImmutableList<SimilarityResultNode> findMostSimilar(List<T> queries, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService ex)
          Brute force find most similar structure for a set of structures.
 SimilarityResults findMostSimilar(UnguardedPagedOverlap<T> queries, int queriesGroup, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService ex)
          Brute force find most similar structure for a set of structures.
 ImmutableList<SimilarityResultNode> findMostSimilarOnSingleThread(List<T> queries, com.chemaxon.calculations.common.SubProgressObserver po)
          Single threaded reference for multi query most similar lookup.
 SimilarityResultNode findMostSimilarOnSingleThread(T query, com.chemaxon.calculations.common.SubProgressObserver po)
          Find most similar for a given query structure.
 ImmutableList<SimilarityResultNode> findMostSimilars(T query, int count, com.chemaxon.calculations.common.SubProgressObserver po, ExecutorService ex)
          Find most similar structures for a single query.
 ImmutableList<SimilarityResultNode> findMostSimilarsOnSingleThread(T query, com.chemaxon.calculations.common.SubProgressObserver po, int maxCount)
          Find most similars for a given query structure.
 String getDescriptorAsString(int index, boolean recognizeArrays)
          Retrieve descriptor as String.
(package private)  UnmodifiableIterator<T> iterateDescriptors()
          Iterate descriptors.
(package private)  UnmodifiableIterator<ImmutableList<T>> iterateDescriptors(int pagesize)
          Iterate descriptors.
 int size()
          Total number of descriptors stored.
 void traverse(UnguardedVisitor<T> visitor, com.chemaxon.calculations.common.SubProgressObserver po)
          Traverse storage on single thread.
 UnguardedPagedOverlap<T> withComparator(UnguardedDissimilarityCalculator<T> comparator)
          Construct another BruteForcePagedSimilarity instance representing a different comparator.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PAGES_GROUP_SIZE_FOR_SINGLE_QUERY

public static final int PAGES_GROUP_SIZE_FOR_SINGLE_QUERY
Number of pages to group when processing single query.

See Also:
Constant Field Values
Constructor Detail

UnguardedPagedOverlap

public UnguardedPagedOverlap(UnguardedDissimilarityCalculator<T> comparator,
                             ImmutableList<ImmutableList<T>> pages,
                             int size)
Construct new immutable reference from prepared storage.

Parameters:
comparator - Comparator to be used
pages - List of pages
size - Total number of descriptors
Throws:
chemaxon.license.LicenseException - when appropriate license is not available

UnguardedPagedOverlap

public UnguardedPagedOverlap(Function<D,T> extractor,
                             UnguardedDissimilarityCalculator<T> comparator,
                             List<ImmutableList<D>> pages,
                             List<D> page)
Construct new immutable reference from pages of Descriptor instances.

Type Parameters:
D - Type of descriptors to transform
Parameters:
extractor - Function to extract unguarded descriptor content for storage
comparator - Unguarded comparator to be represented by the constructed instance
pages - List of descriptor pages. This parameter can not be null.
page - Additional descriptor page, considered as last page. Ignored if null given.
Throws:
chemaxon.license.LicenseException - when appropriate license is not available
Method Detail

withComparator

public UnguardedPagedOverlap<T> withComparator(UnguardedDissimilarityCalculator<T> comparator)
Construct another BruteForcePagedSimilarity instance representing a different comparator.

This operation is memory efficient, since page storage is immutable thus reused.

Parameters:
comparator - Comparator to be used
Returns:
Instance representing comparison with the given comparator

size

public int size()
Total number of descriptors stored.

Returns:
Size

traverse

public void traverse(UnguardedVisitor<T> visitor,
                     com.chemaxon.calculations.common.SubProgressObserver po)
Traverse storage on single thread.

Note that the callback must not modify passed descriptors.

Parameters:
visitor - Callback to invoke
po - ProgressObserver to track progress. Upon completion SubProgressObserver.done() will be invoked.

iterateDescriptors

UnmodifiableIterator<T> iterateDescriptors()
Iterate descriptors.

Returns:
Iterator for stored descriptors

iterateDescriptors

UnmodifiableIterator<ImmutableList<T>> iterateDescriptors(int pagesize)
Iterate descriptors.

Parameters:
pagesize - Number of descriptors to page together
Returns:
Iterator

getDescriptorAsString

public String getDescriptorAsString(int index,
                                    boolean recognizeArrays)
Retrieve descriptor as String.

This operation is recommended for debug only. Execution might be slow.

Parameters:
index - Index of descriptor
recognizeArrays - Recognize arrays and traverse its elements
Returns:
Descriptor as String

findMostSimilars

public ImmutableList<SimilarityResultNode> findMostSimilars(T query,
                                                            int count,
                                                            com.chemaxon.calculations.common.SubProgressObserver po,
                                                            ExecutorService ex)
Find most similar structures for a single query.

This method utilizes concurrent execution and blocks until completion. ProgressObserver callback is invoked from the calling thread only.

\

Parameters:
query - Query descriptor
count - Number of expected most similar structures
po - ProgressObserver to track progress. Upon completion SubProgressObserver.done() will be invoked.
ex - ExecutorService to run workers
Returns:
List of most similar structures, ordered by increasing dissimilarity order
Throws:
CancellationException - when cancelled through supplied progress observer

calculateFullMatrix

public ImmutableList<double[]> calculateFullMatrix(List<T> queries,
                                                   com.chemaxon.calculations.common.SubProgressObserver po,
                                                   ExecutorService ex,
                                                   boolean managed)
Calculate full dissimilarity matrix.

Calculate full dissimilarity matrix between the represented targets and the given queries. Note that the resulting matrix might have be excessively large size.

The resulting structure is allocated upon startup.

Parameters:
queries - Query descriptors
po - ProgressObserver to track progress. Upon completion SubProgressObserver.done() will be invoked.
ex - ExecutorService to run workers
managed - if true, use JMX for monitoring
Returns:
List of dissimilarity values for each query. Element i will contain the dissimilarity vector for query i. The dissimilarity vector contains the dissimilarities of targets. *
Throws:
CancellationException - when cancelled through supplied progress observer
IllegalStateException - possible exceptions due to JMX connection are wrapped to IllegalStateException

calculateSelfKnn

public KnnResults calculateSelfKnn(int k,
                                   int queriesGroup,
                                   com.chemaxon.calculations.common.SubProgressObserver po,
                                   ExecutorService ex)
                            throws ExecutionException
Brute force find kNN among the contents of the represented descriptors.

Parameters:
k - Number of nearest neighbors to find
queriesGroup - Number of queries to group
po - ProgressObserver to track progress.
ex - ExecutorService to run workers. Upon completion SubProgressObserver.done() will be invoked.
Returns:
Storage of knn found. Note that self ids are not part of the knn lists,
Throws:
ExecutionException - Re-theown

findMostSimilar

public SimilarityResults findMostSimilar(UnguardedPagedOverlap<T> queries,
                                         int queriesGroup,
                                         com.chemaxon.calculations.common.SubProgressObserver po,
                                         ExecutorService ex)
Brute force find most similar structure for a set of structures.

For each query index the most similar target index is recorded.

Parameters:
queries - Query descriptors
queriesGroup - Number of queries to group
po - ProgressObserver to track progress. Upon completion SubProgressObserver.done() will be invoked.
ex - ExecutorService to run workers
Returns:
Storage of most similars found

findMostSimilar

public ImmutableList<SimilarityResultNode> findMostSimilar(List<T> queries,
                                                           com.chemaxon.calculations.common.SubProgressObserver po,
                                                           ExecutorService ex)
Brute force find most similar structure for a set of structures.

This method utilizes concurrent execution and blocks until completion. ProgressObserver callback is invoked from the calling thread only.

Note that when the best dissimiliraty score is associated for multiple queries one of them is picked. The selection in this case is non deterministic.

Parameters:
queries - Query descriptors (in the unguarded form)
po - ProgressObserver to track progress. Upon completion SubProgressObserver.done() will be invoked.
ex - ExecutorService to run workers
Returns:
List of most similar nodes associated to the query structures
Throws:
CancellationException - when cancelled through supplied progress observer

findMostSimilarOnSingleThread

public ImmutableList<SimilarityResultNode> findMostSimilarOnSingleThread(List<T> queries,
                                                                         com.chemaxon.calculations.common.SubProgressObserver po)
                                                                  throws CancellationException
Single threaded reference for multi query most similar lookup.

Parameters:
queries - Queries to search
po - Observer; will be closed upon finish/abort/error. One work unit is assigned to one comparison.
Returns:
Results
Throws:
CancellationException - When cancelled through supplied progress observer

findMostSimilarOnSingleThread

public SimilarityResultNode findMostSimilarOnSingleThread(T query,
                                                          com.chemaxon.calculations.common.SubProgressObserver po)
                                                   throws CancellationException
Find most similar for a given query structure.

This method is intended only for test/diagnostincs. Users of this API usually need to invoke findMostSimilar(java.util.List, com.chemaxon.calculations.common.SubProgressObserver, java.util.concurrent.ExecutorService) instead.

This method blocks until ready and uses a single (the calling) thread to do the calculation.

Parameters:
query - Query descriptor
po - Progress observer to track progress. Completion is reported by invoking SubProgressObserver.done() upon completion, cancellation or error
Returns:
The most similar structure found
Throws:
CancellationException - upon cancellation through the given progress observer

findMostSimilarsOnSingleThread

public ImmutableList<SimilarityResultNode> findMostSimilarsOnSingleThread(T query,
                                                                          com.chemaxon.calculations.common.SubProgressObserver po,
                                                                          int maxCount)
                                                                   throws CancellationException
Find most similars for a given query structure.

This method is intended only for test/diagnostincs. Users of this API usually need to invoke findMostSimilar(java.util.List, com.chemaxon.calculations.common.SubProgressObserver, java.util.concurrent.ExecutorService) instead.

This method blocks until ready and uses a single (the calling) thread to do the calculation.

Parameters:
query - Query descriptor
po - Progress observer to track progress. Completion is reported by invoking SubProgressObserver.done() upon completion, cancellation or error
maxCount - Max results count
Returns:
List of at most the given number of most similars, in increasing dissimilarity order
Throws:
CancellationException - upon cancellation through the given progress observer

jklustor-overlap-0.0.2-20140619012940 (ChemAxon)