Skip to content

Molecular representations in AI-driven drug discovery: a review and practical guide.

Literature Information

DOI10.1186/s13321-020-00460-5
PMID33431035
JournalJournal of cheminformatics
Impact Factor5.7
JCR QuartileQ1
Publication Year2020
Times Cited128
KeywordsArtificial intelligence, Cheminformatics, Drug discovery, Linear notation, Macromolecules
Literature TypeJournal Article, Review
ISSN1758-2946
Pages56
Issue12(1)
AuthorsLaurianne David, Amol Thakkar, Rocío Mercado, Ola Engkvist

TL;DR

This review highlights the evolution of electronic molecular representations crucial for computational analysis in drug discovery, emphasizing their significance in the context of AI-driven methodologies. By presenting various popular chemical representations and their applications, the paper aims to assist researchers unfamiliar with these tools in navigating the intersection of chemistry and artificial intelligence.

Search for more papers on MaltSci.com

Artificial intelligence · Cheminformatics · Drug discovery · Linear notation · Macromolecules

Abstract

The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.

MaltSci.com AI Research Service

Intelligent ReadingAnswer any question about the paper and explain complex charts and formulas
Locate StatementsFind traces of a specific claim within the paper
Add to KBasePerform data extraction, report drafting, and advanced knowledge mining

Primary Questions Addressed

  1. What are the key challenges in standardizing molecular representations for AI applications in drug discovery?
  2. How do different molecular representations impact the accuracy of AI models in predicting drug efficacy?
  3. In what ways can emerging technologies enhance the visualization of complex molecular structures for researchers?
  4. What role do graph-based representations play in improving the interpretability of AI-driven drug discovery processes?
  5. How can interdisciplinary collaboration between chemists and data scientists optimize the use of molecular representations in drug discovery?

Key Findings

Research Background and Objectives

The review by David et al. (2020) focuses on the evolving role of molecular representations in the context of AI-driven drug discovery. As computational techniques have advanced, the need for standardized and machine-readable representations of chemical structures has become critical for effective cheminformatics applications. The authors aim to provide a comprehensive overview of various molecular representations, their applications in drug discovery, and a practical guide for researchers new to this field.

Main Methods/Materials/Experimental Design

The authors categorize molecular representations into several types, emphasizing the importance of molecular graphs, linear notations, and chemical descriptors. The review outlines the following key representations:

  1. Molecular Graphs: Represent molecules as graphs where nodes represent atoms and edges represent bonds.
  2. Linear Notations: Include systems like SMILES and InChI, which provide compact representations suitable for computational processing.
  3. Chemical Descriptors: Encode physicochemical properties of compounds rather than their exact structures.

Technical Route (Mermaid Code)

Mermaid diagram

Key Results and Findings

  • Molecular Graphs: These representations allow for the encoding of both 2D and 3D information, making them versatile for various applications, including AI-driven predictions and simulations.
  • Linear Notations: SMILES and InChI are highlighted for their compactness and ease of use in databases, though they have limitations in representing certain complex molecular features.
  • Chemical Descriptors: The review emphasizes the utility of molecular fingerprints and structural keys in facilitating similarity searches and quantitative structure-activity relationship (QSAR) modeling.

Main Conclusions/Significance/Innovativeness

The authors conclude that effective molecular representation is crucial for enhancing the drug discovery process through AI and cheminformatics. They highlight the need for continued development and standardization of representations to facilitate interoperability among various cheminformatics tools and applications. The review serves as a foundational guide for both seasoned researchers and newcomers to the field, emphasizing the integration of molecular representations with AI technologies.

Research Limitations and Future Directions

While the review provides a thorough overview, it acknowledges that the coverage of representations is not exhaustive and focuses on areas with active research. Future directions include:

  • The need for improved representations that can accommodate complex molecular features, such as delocalized bonds and coordination compounds.
  • Development of hybrid representations that can leverage the strengths of both graph-based and linear notations.
  • Enhanced integration of AI techniques in predicting molecular properties and optimizing drug design processes.

This review serves as a critical resource for understanding the landscape of molecular representations in drug discovery, paving the way for further innovations in the field.

References

  1. Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature. - Gian Marco Ghiandoni;Michael J Bodkin;Beining Chen;Dimitar Hristozov;James E A Wallace;James Webster;Valerie J Gillet - Journal of chemical information and modeling (2019)
  2. WURCS: the Web3 unique representation of carbohydrate structures. - Kenichi Tanaka;Kiyoko F Aoki-Kinoshita;Masaaki Kotera;Hiromichi Sawaki;Shinichiro Tsuchiya;Noriaki Fujita;Toshihide Shikanai;Masaki Kato;Shin Kawano;Issaku Yamada;Hisashi Narimatsu - Journal of chemical information and modeling (2014)
  3. Annotation of Peptide Structures Using SMILES and Other Chemical Codes-Practical Solutions. - Piotr Minkiewicz;Anna Iwaniak;Małgorzata Darewicz - Molecules (Basel, Switzerland) (2017)
  4. GlycoCT-a unifying sequence format for carbohydrates. - S Herget;R Ranzinger;K Maass;C-W V D Lieth - Carbohydrate research (2008)
  5. Algorithm for reaction classification. - Hans Kraut;Josef Eiblmaier;Guenter Grethe;Peter Löw;Heinz Matuszczyk;Heinz Saller - Journal of chemical information and modeling (2013)
  6. CGRtools: Python Library for Molecule, Reaction, and Condensed Graph of Reaction Processing. - Ramil I Nugmanov;Ravil N Mukhametgaleev;Tagir Akhmetshin;Timur R Gimadiev;Valentina A Afonina;Timur I Madzhidov;Alexandre Varnek - Journal of chemical information and modeling (2019)
  7. CHUCKLES: a method for representing and searching peptide and peptoid sequences on both monomer and atomic levels. - M A Siani;D Weininger;J M Blaney - Journal of chemical information and computer sciences (1994)
  8. Mercury 4.0: from visualization to analysis, design and prediction. - Clare F Macrae;Ioana Sovago;Simon J Cottrell;Peter T A Galek;Patrick McCabe;Elna Pidcock;Michael Platings;Greg P Shields;Joanna S Stevens;Matthew Towler;Peter A Wood - Journal of applied crystallography (2020)
  9. Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database. - Miguel Quirós;Saulius Gražulis;Saulė Girdzijauskaitė;Andrius Merkys;Antanas Vaitkus - Journal of cheminformatics (2018)
  10. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. - Jonathan B Baell;Georgina A Holloway - Journal of medicinal chemistry (2010)

Literatures Citing This Work

  1. From Big Data to Artificial Intelligence: chemoinformatics meets new challenges. - Igor V Tetko;Ola Engkvist - Journal of cheminformatics (2020)
  2. Advances in de Novo Drug Design: From Conventional to Machine Learning Methods. - Varnavas D Mouchlis;Antreas Afantitis;Angela Serra;Michele Fratello;Anastasios G Papadiamantis;Vassilis Aidinis;Iseult Lynch;Dario Greco;Georgia Melagraki - International journal of molecular sciences (2021)
  3. Proposal of the Annotation of Phosphorylated Amino Acids and Peptides Using Biological and Chemical Codes. - Piotr Minkiewicz;Małgorzata Darewicz;Anna Iwaniak;Marta Turło - Molecules (Basel, Switzerland) (2021)
  4. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. - Kauê Santana;Lidiane Diniz do Nascimento;Anderson Lima E Lima;Vinícius Damasceno;Claudio Nahum;Rodolpho C Braga;Jerônimo Lameira - Frontiers in chemistry (2021)
  5. Progress on open chemoinformatic tools for expanding and exploring the chemical space. - José L Medina-Franco;Norberto Sánchez-Cruz;Edgar López-López;Bárbara I Díaz-Eufracio - Journal of computer-aided molecular design (2022)
  6. Cheminformatic Characterization of Natural Antimicrobial Products for the Development of New Lead Compounds. - Samson Olaitan Oselusi;Alan Christoffels;Samuel Ayodele Egieyeh - Molecules (Basel, Switzerland) (2021)
  7. Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design. - Yue Cao;Payel Das;Vijil Chenthamarakshan;Pin-Yu Chen;Igor Melnyk;Yang Shen - Proceedings of machine learning research (2021)
  8. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. - Mario Lovrić;Tomislav Đuričić;Han T N Tran;Hussain Hussain;Emanuel Lacić;Morten A Rasmussen;Roman Kern - Pharmaceuticals (Basel, Switzerland) (2021)
  9. The Middle Science: Traversing Scale In Complex Many-Body Systems. - Aurora E Clark;Henry Adams;Rigoberto Hernandez;Anna I Krylov;Anders M N Niklasson;Sapna Sarupria;Yusu Wang;Stefan M Wild;Qian Yang - ACS central science (2021)
  10. Representation of molecules for drug response prediction. - Xin An;Xi Chen;Daiyao Yi;Hongyang Li;Yuanfang Guan - Briefings in bioinformatics (2022)

... (118 more literatures)


© 2025 MaltSci - We reshape scientific research with AI technology