Organic Chemistry, Short talk
OC-015

Assessment of the Synthetic Feasibility of Generated Chemical Space by Computer Assisted Synthesis Planning

A. Thakkar1, V. Chadimova2, E. J. Bjerrum2, O. Engkvist2, J. L. Reymond1*
1Department of Chemistry and Biochemistry, Freiestrasse 3, 3012 Bern, 2Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden

Computer assisted synthesis planning has gained considerable interest in recent years owing to the resurgence of artificial intelligence (AI), and the prospect of accelerating the discovery and synthesis of new chemical entities.[1] Our previous work highlights the implementation of a retrosynthetic prediction tool trained on the largest collection of datasets to date, and demonstrates its applicability to a set of compounds obtained from virtual libraries.[2] Additionally, we have further augmented our CASP tool with a model called ‘Ring Breaker’ to assess synthetic disconnections for complex ring systems.[3] This model is trained specifically for ring forming reactions and is used to augment the search for synthetic pathways by identifying routes that utilise ring formations. To maximise the number of synthetic options during the search for synthetic pathways, we further augment the model with an applicability filter, which informs the model which reactions are applicable in silico.  

In this study, we build upon our previous work in computer aided synthesis planning (CASP) by tackling the problem of synthetic accessibility.[2, 3] The improvements to our baseline retrosynthetic tool allow for a better estimation of the synthetic feasibility of a diverse set of compounds obtained from ChEMBL, GDBChEMBL, GDBMedChem and Drugbank, as determined by running full retrosynthetic predictions. The outcomes of the retrosynthetic predictions are used as an estimate for synthetic feasibility and are used to train a variety of machine/deep learning models that can be used as a surrogate to the prediction of full synthetic routes. The resulting surrogate model can be used to score the synthetic accessibility of a diverse set of generated compounds from virtual libraries or used in the generation process to maximise the synthetic feasibility of compounds.

[1] Jensen, K. F.;  Coley, C. W.; Eyke, N. S., Autonomous discovery in the chemical sciences part I: Progress. Angewandte Chemie International Edition 2019.
[2] Thakkar, A.;  Kogej, T.;  Reymond, J.-L.;  Engkvist, O.; Bjerrum, E. J., Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chemical Science 2020, 11 (1), 154-168.
[3] Thakkar, A.;  Selmi, N.;  Reymond, J.-L.;  Engkvist, O.; Bjerrum, E. J., ‘Ring Breaker’: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space. ChemRxiv 2020.