MadFast Similarity Search

Blazing fast similarity searching tools

MadFast is a high-end toolkit for ultra fast chemical similarity search using in-memory data storage and optimized multi threaded-implementation. The outstanding search performance extends the chemical space available for live search to 100s of millions of compounds. Rapid fingerprint generation and short initialization time along with the large set of comparison methods provide you with the possibility to optimize the similarity space. MadFast is a Java application that is available via versatile interfaces: command line, REST API and Web UI. Extensive documentation and usage examples are provided.

MadFast development is driven by user feedback. We are interested in your related workflows, feature requests and use-cases. If you have any comment, question or suggestion please feel free to contact us at [email protected]

For Linux,
or Mac (experimental)
What's new?
Browse the history of
changes for the latest
release: new features,
improvements and
changes are documented.
Try it out!
Online demo
filled with
various datasets
Browse documentation
of the latest version
Also included in the
downloaded distribution.
Learn more

New developments:

See History of changes for the description of all changes.

Property space in overlap analysis

Use additional properties of the input molecules in the overlap analysis visualization. Numerical values can be imported from SD properties or calculated using Chemical Terms expressions. These properties can be displayed and filtered among the dissimilarity values in the visualization page.

Documents Store additional data and Introduction to overlap analysis provide an overview of this functionality.

Similarity based overlap analysis

Similarity based overlap analysis (full matrix calculation) of large libraries, up to millions of compounds is possible with the fast multi query similarity search implementation.

1M by 1M exhaustive similarity search using 1024 bit binary fingerprint takes

  • ~30 minutes on c3.8xlarge AWS instance
  • ~8 minutes on x1.32xlarge AWS instance

Real time similarity search

Visualization of the similarity search results with Web UI let you experience the real time responsiveness during similarity searching of large number of structures.

Think big! MadFast delivers 40 most similar structure in

  • ~80 ms per 16 M structures.
  • ~5 sec per 1 billion structures.
  • 250-350 MB of memory usage per million molecules.
  • 1 million structures per minute preparation (import) speed.

Ad-hoc focused chemical space analysis

MadFast enables the utilization of various descriptors, descriptor configurations and comparison metrics. The Web UI is designed to display search result sets from multiple data sets with dissimilarity distribution histograms.

Overlap heatmaps

Command line tool for launching similarity searches (searchStorage) has been improved with similarity heatmap image generation and output formatting.

Additional media

Gábor Imre (ChemAxon): ChemAxon's MadFast Similarity Search

MadFast is an ongoing development effort to provide useful tools for similarity based search, overlap analysis and clustering. A short introduction was given on the current state of the development and future plans. Presented at ChemAxon's Annual Meeting Boston, 2016.

Stephen Pickett (GlaxoSmithKline): Fast similarity searching – making the virtual real

Similarity searching is a key component of many chemoinformatic processes including compound collection design, compound clustering and lead hopping. We have command-line tools for bespoke analysis and Cartridge based systems for similarity searching standard compound sets. However, we would like to use these applications in a more interactive manner whilst compounds are being designed: "who else is working on something similar?", "can we buy this compound?", "is in our virtual set?". Several years ago ChemAxon approached us about the MadFast prototype and we have been collaborating to develop the application. In this talk I will give some benchmark statistics against existing in-house command-line tools and describe how MadFast is being deployed to provide interactive searching as part of LiveDesign. Presented at ChemAxon's Annual Meeting Budapest, 2016.