Subscribe in a reader

Cambridge MedChem Consulting

Proof of concept funding

UKRI have just announced funding for proof of concept to support the commercialisation of research to enable spinouts or social ventures, licensing or other commercialisation pathways. Details are here.

Applications from any disciplines are welcomed. No pre-existing UK Research and Innovation (UKRI) funding is required. The programme will not support discovery-driven research. You must be based at a UK research organisation. The full economic cost (FEC) can be up to £250,000 for 12 months duration with a minimum of £100,000 for 6 months. UKRI will fund 80% FEC.

This UKRI funding opportunity aims to de-risk the commercialisation of research. This will allow research organisations and their partners to deliver better commercialisation outcomes via the establishment of successful university spinouts or social ventures, as well as developing applicable solutions through other commercialisation routes to deliver societal and economic impacts and benefits from research.

Antiviral Competition opens Jan 13th

As part of its open science mission, the ASAP Discovery Consortium is conducting a computational methods competition encompassing several modalities critical to small molecule drug discovery. This competition will be run in collaboration with OpenADMET, which is a new ARPA-H funded project under the Open Molecular Software Foundation (OMSF).

This competition will be composed of three sub-challenges:

Ligand Poses: ASAP has produced a large volume of X-ray crystallography data over its years of operation. Along this trajectory, SARS-CoV-2 Mpro was structurally enabled much earlier than MERS-CoV. This sub-challenge will recreate that situation. Given a training set of SARS-CoV-2 Mpro X-ray structures, participants will be asked to predict poses of a test set of compounds for MERS-CoV Mpro. The crystallography experiments for this sub-challenge were performed by the University of Oxford and Diamond Light Source. See here for the crystallography conditions.

Potency: Given a training set of dose-response fluorescence potency data for both targets (SARS and MERS Mpro), participants will be challenged to predict potencies for a blind set of compounds for both targets. The assays for this sub-challenge were performed by the Weizmann Institute of Science. See here for the experimental conditions.

ADMET: This sub-challenge will consist of multiple ADMET endpoints. Participants will receive training data for all endpoints and will be asked to predict the same endpoints for a blind set of compounds. The assays for this sub-challenge were performed by Bienta.

Full details and preliminary data are available online. https://polarishub.io/blog/antiviral-competition.

ChEMBL 35 is out

The year ends with an update to ChEMBL. This release contains 2.5 million compounds and 1.7 million assays including over 15K drugs or molecules in development.

chembl35

You can download the dataset in various formats https://chembl.gitbook.io/chembl-interface-documentation/downloads.

Full details of the update are on the ChEMBL blog. https://chembl.blogspot.com/2024/12/heres-nice-christmas-gift-chembl-35-is.html.

Seasons Greetings

As many of you know I don't send Christmas cards, instead I give the monies I would have spent to MS Research. Have a great time and a successful New Year,

NZsnow

Comparison of protein structure prediction algorithms

The majority of drug targets are proteins and knowledge of the 3D structure of the protein can be very helpful for structure based design. Whilst the PDB contains 227,933 structures there are still a number of structures that lack structural information. In 2018 Deepmind released AlphaFold an artificial Intelligence program design to predict protein 3D structure from the amino-acid sequence DOI. Since then there have a series of updates that have included the ability to handle small molecules, co-factors, nucleic acids, protein complexes etc. AlphaFold has been used in collaboration with the EBI to create AlphaFold DB which provides open access to over 200 million protein structures, covering the human proteome and the proteomes of 47 other key organisms important in research and global health. A recent addition is Foldseek a protein structural search program that allows users to search the AlphaFold Database.

pfCAalphafold

David Baker, Demis Hassabis and John Jumper were awarded the 2024 Nobel Prize for Chemistry. One half of the prize has been awarded to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction.”

Whilst AphaFold gets much of the publicity, it has served to spawn a number of related programs, comparison of the different options is difficult especially when looking at the various licensing options. Fortunately, Brian Naughton has posted a very useful summary. http://blog.booleanbiotech.com/alphafold3-boltz-chai1.html.

CSD, ChEMBL, PDBe now interlinked

Three critical databases for drug discovery are now interlinked. The Cambridge Structural Database (CSD) a curated repository of small molecule crystal structures, ChEMBL a manually curated database of bioactive molecules with their associated biological data and PDBe a founding member of the Worldwide Protein Data Bank (wwPDB) which collects, organises and disseminates data on biological macromolecular structures, are now interlinked.

The BioChemGraph (BCG) project tackles the challenge of linking diverse data in biology by creating a resource that integrates data from the PDBe, ChEMBL, and the CSD. This has been achieved by mappings UniProt Accession ID and compound InChIKey, linkingmore than 17,000 experimentally determined protein-ligand complexes from the PDB to about 39,000 ChEMBL bioactivity records. By providing this link it is possible to not only identify binding affinity for the selected target but also much more information about the small molecule ligand, such as off-target activities, calculated physicochemical properties and also any ADME/T data that might be available. All data can be downloaded as a tsv as shown below.

pde_chembl

ChEMBL and PDBe have collaborated to set up an automatic pipeline for generating these data. As a result, the data will be updated weekly, in sync with the PDBe release every Wednesday at 00.00 UTC.

InChis have also been used to interconnect with Cambridge Structural Database using UniChem, 235,000 CSD identifiers have been linked corresponding entries in UniChem, a “universal translator” for chemistry using InChIs to connect chemical structures and their identifiers across various databases. UniChem enables researchers to seamlessly access information about a specific molecule across a wide variety of data sources. There are currently 41 data sources (https://www.ebi.ac.uk/unichem/sources).

AlphaProteo generates novel proteins

Protein protein interactions are always a challenge to optimise and it looks like the latest offering from Google DeepMind may be of significant help.

Protein binders that can bind tightly to a target protein are hard to design. Traditional methods are time intensive, requiring multiple rounds of extensive lab work. After the binders are created, they undergo additional experimental rounds to optimize binding affinity, so they bind tightly enough to be usefu

AlphaProteo generates novel proteins that bind to other proteins. Given the structure of a target molecule and a set of preferred binding locations on that molecule, AlphaProteo generates a candidate protein that binds to the target at those locations.

Whilst code is not available, note

If you’re a biologist, whose research could benefit from target-specific protein binding, and you’d like to register interest in being a trusted tester for AlphaProteo, please reach out to us on alphaproteo@google.com.

Privileged Structures

The term "privileged structures" was first coined by Ben Evans DOI: 10.1021/jm00120a002 who recognised the potential of certain regularly occurring structural motifs as templates for derivatization to discovery novel ligands for binding to proteins. In this seminal paper they identified a benzodiazepine and substituted indole as key structures in their work to yield CCK antagonists.

Two very popular privileged structures are N-benzyl piperidine and N-benzyl piperazine. They offer a variety of different interactions (pi-stacking, hydrophobic, electrostatic) with a relatively well defined 3D structure.

NBP

A recent publication gives a very nice summary of their use in drug discovery DOI.

Abstract The N-benzyl piperidine (N-BP) structural motif is commonly employed in drug discovery due to its structural flexibility and three-dimensional nature. Medicinal chemists frequently utilize the N-BP motif as a versatile tool to fine-tune both efficacy and physicochemical properties in drug development. It provides crucial cation-π interactions with the target protein and also serves as a platform for optimizing stereochemical aspects of potency and toxicity. This motif is found in numerous approved drugs and clinical/preclinical candidates. This review focuses on the applications of the N-BP motif in drug discovery campaigns, emphasizing its role in imparting medicinally relevant properties. We provide an overview of approved drugs, the clinical and preclinical pipeline, and discuss its utility for specific therapeutic targets and indications, along with potential challenges.

Drug-Induced Liver injury prediction

Many compounds can cause liver injury, after oral administration the first major organ exposed is the liver. The LiverTox is a database of information on the diagnosis, cause, frequency, patterns, and management of liver injury attributable to prescription and nonprescription medications, herbals and dietary supplements.

Checking for the potential to cause liver injury is an important part of the drug discovery process and there are a number of in vitro and in vivo assays that can be used.

High dose studies in safety species are undertaken to identify potential toxicities and to determine safety margins, Clinically, the most relevant reactions include liver necrosis, hepatitis, cholestasis, vascular changes and steatosis. A drug can cause liver toxicity via multiple mechanisms, it can be the result of a direct action of the parent compound or indirectly through reactive metabolites. The drug or its metabolites may cause liver toxicity after specific receptor binding, or reactive metabolites can react with hepatic macromolecules, all leading to direct cytotoxicity. In addition, Immune-mediated idiosyncratic drug reaction has been responsible for numerous serious hepatotoxic events in humans

It would be useful to be able to predict ahead of synthesis whether a molecule was likely to cause liver injury and that is the function of DILIpredictor DOI. Using data form several thousand molecules and a variety of different assays (both in vitro and in vivo) and different species the authors have developed a predictive model. The attraction of this approach is in addition to giving an early flag of potential DILI it also highlights potential species differences and can give an insight into the mechanism.

DILIPredictor required only chemical structures as input for prediction and is publicly available at https://broad.io/DILIPredictor for use via web interface (please don't submit confidential molecules) and with all code available for download from GitHub

https://github.com/srijitseal/DILI_Predictor

I installed like this since it currently does not run with the latest version of python

conda create -n DILIpred python=3.10
conda activate DILIpred

pip install DILIpred

It can then beinstalled as follows.

conda create -n DILIpred python=3.10
conda activate DILIpred

pip install DILIpred

It can then be usef as follows

(DILIpred) chrisswain@Mac-Studio ~ % dilipred -smiles "C[C@@H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C"

If you use DILIPred in your work, please cite: Improved Detection of Drug-Induced Liver Injury by Integrating Predicted In Vivo and In Vitro Data Srijit Seal, Dominic Williams, Layla Hosseini-Gerami, Manas Mahale, Anne E. Carpenter, Ola Spjuth, and Andreas Bender doi: https://doi.org/10.1021/acs.chemrestox.4c00015

100%███████████████████████████████████████████████████████████████ 1/1 [00:01<00:00, 1.14s/it] 2024-07-12 08:29:44.777 | CRITICAL | dilipred.main:predict:458 - The compound is predicted DILI-Positive

The detailed output is contained in a file created.

source,assaytype,description,value,pred,SHAP contribution to Toxicity,SHAP,smiles,smilesr DILI,DILIstFDA,This is the predicted FDA DILIst label,0.8187885151383966,1,N/A,N/A,C[C@@H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C= C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O)c2ccc(F)cc2)C3C)n1 Diverse DILI C,Heterogenous Data ,"Transient liver function abnormalities, adverse hepatic effects",0.7393781727510759,True,Positive,0.003926801602711443,C[C@@H] 1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O )c2ccc(F)cc2)C3C)n1 BESP,Mechanisms of Liver Toxicity,BESP Bile Salt Export Pump Inhibition,0.5655727513227511,True,Positive,0.0003070244326849143,C[C@@H ]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(= O)c2ccc(F)cc2)C3C)n1 Mitotox,Mechanisms of Liver Toxicity,Mitotox ,0.10973983865879627,False,Positive,0.0006130947805277731,C[C@@H]1C2=NN= C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O)c2ccc( F)cc2)C3C)n1 Reactive Metabolite,Mechanisms of Liver Toxicity,Reactive Metabolite Formation,0.19967540492325553,False,Negative,-0.001048102741727997,C[C@@ H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C( =O)c2ccc(F)cc2)C3C)n1 Human hepatotoxicity,Human hepatotoxicity,"Human hepatotoxicity, hepatobiallry",0.7196576912119554,True,Positive,0.007802364907899422,C[C @@H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN( C(=O)c2ccc(F)cc2)C3C)n1 Animal hepatotoxicity A,Animal hepatotoxicity,"Rat, chronic oral administration, Hepatic histopathologic effects, ToxRefDB",0.5867747455286331,True,Positive,0.0030731971130978854,C[C@@H] 1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O )c2ccc(F)cc2)C3C)n1 Animal hepatotoxicity B,Animal hepatotoxicity,"Hepatocellular hypertrophy, rats, ORAD, HESS",0.6646590439473917,True,Positive,0.0013360236463188587,C[C@@H]1C2= NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O) c2ccc(F)cc2)C3C)n1 Preclinical hepatotoxicity,Animal hepatotoxicity,"Preclinical hepatotoxicity data from PharmaPendium, Leadscopre, and internal repository with 14- to 28-day rat study data",0.8576928962241468,True,Positive,0.011692057666492625,C[C@@H]1C2= NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O) c2ccc(F)cc2)C3C)n1 Diverse DILI A,Heterogenous Data ,Large-scale and diverse ddrug induced liver injury dataset,0.6324274304660036,True,Positive,0.003398315762493277,C[C@@H]1C2 =NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O) c2ccc(F)cc2)C3C)n1 source,assaytype,description,value,pred,SHAP contribution to Toxicity,SHAP,smiles,smilesr DILI,DILIstFDA,This is the predicted FDA DILIst label,0.8187885151383966,1,N/A,N/A,C[C@@H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C= C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O)c2ccc(F)cc2)C3C)n1 Diverse DILI C,Heterogenous Data ,"Transient liver function abnormalities, adverse hepatic effects",0.7393781727510759,True,Positive,0.003926801602711443,C[C@@H] 1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O )c2ccc(F)cc2)C3C)n1 BESP,Mechanisms of Liver Toxicity,BESP Bile Salt Export Pump Inhibition,0.5655727513227511,True,Positive,0.0003070244326849143,C[C@@H ]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(= O)c2ccc(F)cc2)C3C)n1 Mitotox,Mechanisms of Liver Toxicity,Mitotox ,0.10973983865879627,False,Positive,0.0006130947805277731,C[C@@H]1C2=NN= C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O)c2ccc( F)cc2)C3C)n1 Reactive Metabolite,Mechanisms of Liver Toxicity,Reactive Metabolite Formation,0.19967540492325553,False,Negative,-0.001048102741727997,C[C@@ H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C( =O)c2ccc(F)cc2)C3C)n1 Human hepatotoxicity,Human hepatotoxicity,"Human hepatotoxicity, hepatobiallry",0.7196576912119554,True,Positive,0.007802364907899422,C[C @@H]1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN( C(=O)c2ccc(F)cc2)C3C)n1 Animal hepatotoxicity A,Animal hepatotoxicity,"Rat, chronic oral administration, Hepatic histopathologic effects, ToxRefDB",0.5867747455286331,True,Positive,0.0030731971130978854,C[C@@H] 1C2=NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O )c2ccc(F)cc2)C3C)n1 Animal hepatotoxicity B,Animal hepatotoxicity,"Hepatocellular hypertrophy, rats, ORAD, HESS",0.6646590439473917,True,Positive,0.0013360236463188587,C[C@@H]1C2= NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O) c2ccc(F)cc2)C3C)n1 Preclinical hepatotoxicity,Animal hepatotoxicity,"Preclinical hepatotoxicity data from PharmaPendium, Leadscopre, and internal repository with 14- to 28-day rat study data",0.8576928962241468,True,Positive,0.011692057666492625,C[C@@H]1C2= NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O) c2ccc(F)cc2)C3C)n1 Diverse DILI A,Heterogenous Data ,Large-scale and diverse ddrug induced liver injury dataset,0.6324274304660036,True,Positive,0.003398315762493277,C[C@@H]1C2 =NN=C(N2CCN1C(=O)C3=CC=C(C=C3)F)C4=NC(=NS4)C,Cc1nsc(-c2nnc3n2CCN(C(=O) c2ccc(F)cc2)C3C)n1

Overall a useful tool to have to hand.

CCDC: Curated Data Set of Protein Structures

Fantastic news from the Cambridge Crystallographic Data Centre (CCDC), a curated data set of protein structures from the Protein Data Bank (PDB) with predicted hydrogen positions is now available for download. The dataset is taken from the Protein Data Bank (PDB) and has the positions of hydrogens accurately computed, this provides a comprehensive snapshot of protein cavities in the PDB, identifying potential binding sites for small molecules with accurately predicted hydrogen positions for all components.

The news article is here https://www.ccdc.cam.ac.uk/discover/blog/accelerating-drug-discovery-with-the-ccdc-aws-and-intel/.

This large subset of the protein data bank which has be processed using the CCDC's protonation workflow so that reasonable proton positions have been modelled can be downloaded here.

https://www.ccdc.cam.ac.uk/support-and-resources/downloads/.