Cambridge MedChem Consulting

Frequent Hitters, False Positives, Promiscuous Compounds

If you have your list of hits the needs to be a recognition that many will be false positives and all hits should be treated with some suspicion at this point. First check purity and then retest, if possible get fresh samples or resynthesis. Since this may require significant resources there have been a number of efforts to identify potential false positives computationally.

Within any screening campaign there are a number of compounds that on further investigation fail to reproduce the initial activity in subsequent assays. There have been a number of studies trying to elucidate the structural features that would allow identification of these artifactual hits before significant effort has been expended following them up.

Shoichet has described a detergent based assay for the detection of frequent hitters (Nat Protoc. 2006 ; 1(2): 550-553). This work is based on the observation that at micromolar concentrations in aqueous solutions, many small molecules self-associate into colloidal aggregates that non-specifically inhibit enzymes and other proteins. From their work Shoichet and workers estimate that at 5 uM, about 1-2% of ”drug-like” molecules seem to behave in this way. This level of inhibition would often be the target hit rate for a HTS campaign. They looked for inhibition of AmpC beta-lactamase (AmpC) in the presence or absence of Triton X-100, compounds that showed inhibition in the absence of detergent but not in the presence of detergent were flagged as potential aggregators. They also provide a check list of observations that might indicate false positives.

  1. Is inhibition significantly attenuated by small amounts of non-ionic detergent? If so, the compound is likely to be acting through aggregation. They typically use 0.01% Triton-X 100 vol/vol, other non-ionic detergents might also work.
  2. In assays that cannot tolerate detergent (e.g., cell-based assays), it might be possible to use high concentrations of serum albumin. A drawback of this method is that albumin can also sequester well-behaved molecules, however the molecules will need to act in the presence of plasma proteins in vivo.
  3. Is inhibition significantly attenuated by increasing enzyme concentration? If so, the compound is likely to be an aggregator. Except when the receptor concentration-to- Ki ratio is high, increasing receptor concentration should not affect percentage inhibition. Of course, when the receptor is membrane-bound or intracellular, this is difficult to probe.
  4. Is inhibition competitive? If so, the compound is unlikely to be an aggregator. Does the inhibitor retain activity after spinning for several minutes in a microfuge? If not, particle formation is likely (see below).
  5. Can you directly observe particles in the 50-1,000 nm size range? They have typically used DLS for this. Formation of particles does not guarantee promiscuous inhibition, but it is a worrying sign.
  6. Is the dose-response curve unusually steep? There are classical reasons for steep dose- response curves, but it too is a worrying sign.

More recently Baell et al (J. Med. Chem. XXXX, XXX, 000 ) DOIhave provided a suggested list of filters that might be used to remove "Pan Assay Interference Compounds (PAINS)" from screening collections. They looked at compounds that were active in multiple assays

Breakdown of Primary Screening Hits According to the Number of HTS Screening Campaigns in Which They Registered As a Hit

Count
6
5
4
3
2
1
0
Total
Number of Compds
362
785
915
1220
4689
12077
73164
93212
Percent of Compds
0.39
0.84
0.98
1.31
5.03
12.96
78.49
100

Whilst 90% of the compounds were either inactive or were only a hit in a single assay, a significant number of the compounds (0.39%) were active in all 6 assays tested. Exploring the structures that appeared promiscuous they were able to identify certain structural types a selection of which are shown in the scheme below. A full listing is available in the supplementary materials. The rhodanines compromise the structures that are widely reported to be false positives.

In the supplementary information they provided the corresponding filters in Sybyl Line Notation (SLN) format, however they have also been converted to SMARTS format and incorporated in sieve file for use in flagging compound collections. If you are a Vortex user then there is also a Vortex script available, nodes are also available for Knime and now it is even available on mobile devices with MolPrime+. In my testing there are subtle differences in the way that aromaticity of some heterocyclic rings are defined between SLN and SMILES so it is unlikely that they will produce exactly the same results, especially for molecular structures way outside the original chemical space. You also get very slightly different results depending on the origin of the sdf/SMILES structure file and whether the search tool can handle tautomers

There is an excellent summary article by the authors in Nature DOI

Naivety about promiscuous, assay-duping molecules is polluting the literature and wasting resources, warn Jonathan Baell and Michael A. Walters.

Academic researchers, drawn into drug discovery without appropriate guidance, are doing muddled science. When biologists identify a protein that contributes to disease, they hunt for chemical compounds that bind to the protein and affect its activity. A typical assay screens many thousands of chemicals. ‘Hits’ become tools for studying the disease, as well as starting points in the hunt for treatments.

But many hits are artefacts — their activity does not depend on a specific, drug-like interaction between molecule and protein. A true drug inhibits or activates a protein by fitting into a binding site on the protein. Artefacts have subversive reactivity that masquerades as drug-like binding and yields false signals across a variety of assays.

Similar structures have also be highlighted by the ALARM NMR technique described by Huth et al (J. Am. Chem. Soc., 2005, 127 (1), pp 217–224) a rapid and reliable NMR-based method for identifying reactive false positives including those that oxidize or alkylate a protein target. These include rhodanines, catechols, quinones, benzofurazans, benzthioxalones, 2-amino-3-carbonylthiophenes, and certain dihydropyridines.

Updated 14 December 2015