Skip to main content
search

March 30, 2023

Medicinal chemistry transformations from patent literature

Designing and optimizing novel drugs require both creativity and knowledge. Using the Matched Molecular Pairs method is one way of supporting this process. Commonly, MMP is used to connect  structural changes of drug molecules to corresponding changes in assay readouts (Figure 1).

The MMP method was used to extract all synthetically available transformations described in the patent database SureCHEMBL. Accordingly, it is possible to get an overview of how often medicinal chemists have used certain transformations, irrespective of their optimization parameters (Table 1).


 

SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline, updated on a daily basis.

Numbers Data points used to create MMP set
61 Years of deposited patent applications
600 MB of text
1.35 M patent applications
20 M exemplified compounds
1.4 M Unique transformations
1-20 Transformation size (#atoms)
> 1000 Transformations with >1000 examples
>50.000 Transformations with > 50 examples

Table 1. Data behind extracted transformations

 

Figure 2. Number of occurrences and a few selected examples (orange bars) from the top 300 transformations (blue bars) in small molecule drug discovery projects,  extracted from SureCHEMBL

Use cases for patent literature MMP transformations

The MMP transformations from SureCHEMBL can be used in different ways to create analogues to a seed compound:

  1. Based on the most common transformations [2]:automatic creation of compounds that are “expected” to be made in a project – making sure you don’t forget any.
  2. Based on the least common transformations: creation of analogues that are “unexpected” – compounds a medicinal chemist would not immediately think about, but could increase the chance of creating novel compounds

These analogues can then be filtered through any additional virtual screening cascade prior to selection for synthesis (Figure 3).

 

Figure 3. Example of workflows applying SureCHEMBL MMP transformations for creation of Design Sets

 

Download the 500 most common transformations from SureCHEMBL

[1] Hussain and Rea, Journal of Chemical Information and Modeling 2010 50 (3), 339-348