Pharma Focus Asia

Machine Learning for New Drugs

Sneha Rai, Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology.

Venugopal Bhatia, Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology

Sonika Bhatnagar, Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology

Hyperlipidemia has been shown to directly lead to diseases like cardiovascular disease, cancer, Type II Diabetes and Alzheimer’s disease. This happens because lipids act as signalling molecules in many biochemical pathways inside our body.In this work, we took a systems viewof the pathways targeted by lipid lowering drugs to determine the driver nodes, or nodes that can control the pathway network. Next, a Random Forest machine learning classifier was applied to the approved drugs interacting with the identified driver nodesto select drugs that can be repurposed for Hyperlipidemia induced complex diseases.

Hyperlipidemia and its associated diseases

Common health problems such as obesity, type-2-diabetes, hypertension and heart problems are caused by the effect of multiple genes, environmental factors as well as lifestyle habits, are called complex diseases. A major risk factor for these diseases is hyperlipidemia (or high lipid levels), which is a complex disorder leading to increased level of blood lipid levels. Hyperlipidemia may occur because of genetic and non-genetic factors. Lipids have a crucial role in health and disease as they are not just the energy storehouse of the body, but also involved in signalling in a large number of interconnected biochemical pathways. Each pathway consists of a number of proteins that together perform a function in the body.  So, an imbalance in lipid levels can disturb the signalling patterns in the body, and the lipid profile forms a critical part of a routine health check-up.

High levels of LDL or “bad” cholesterol increases the risk ofcardiovascular disease, asthma, Alzheimer’s, autoimmune, and kidney diseases.Even during the outbreak of COVID-19, high lipid levels post-recoveryhave been noted and may lead to liver injury. In fact, since cholesterol assists viral entry into human cells, cholesterol-lowering agents help fight the virus. Lipid-lowering drugs such as statins lower therisk of heart disease and liver injury after COVID-19.

Network Systems Biology and machine learning for finding new drugs

Network biology provides a powerful platform to integrate different types of biological data in a single frame for the study of complex diseases. Network measures are increasingly being used in study of complex diseases, and in finding new drug targets.

Even today, it takes a lot of time for a drug to come to the market. On the other hand, drug repurposing can be carried out to find new use for already existing approved drugs. As these drugs are already safe, they can be commercialised for treatment of another disease without delay. The advancement in artificial intelligence and machine learning techniques can help in drug repurposing.

An network view of Hypelipidemia

When we looked at the 35 approved drugs for hyperlipidemia treatment, and their associated biochemical pathways, we found that they were interconnected to each other. This is not really surprising as a set of related proteins in the one or more pathways is responsible for a biological function. So, we made a pathway network and also included the approved lipid lowering drugs. This drug-target network (DTN) had452 proteins, 35 approved Hyperlipidemia drugs and 12,410 edges in whichthe drugs interacted with only 34.7% of the network proteins.  In network control theory, driver nodes are the key elements of the network that drive the communication and signalling. To identify the driver nodes, a directed DTN (DDTN) was needed. The information about the direction of the network is encoded in the biochemical pathways of an organism. So, we identified themain signalling pathways in our DTN andmerged 34 such pathways to develop a DDTN.

Next, CytoScape was used to find 78 driver nodes. The majority of driver nodes were found to be encoded by non-essential genes and were associated with one or more disease traits. We also found that these driver nodes were either successful drug targets or under clinical trials. So, we proposed that the drugs associated with the driver nodes in our DDTN can be repurposed to treat the complex diseases associated with Hyperlipidemia. But there were 130 such drugs, and the best ones needed to be identified, for which we applied machine learning.

A machine learning approach for finding new drugs for Hyperlipidemia induced complex diseases

To apply a machine learning classifier for our study, we needed a positive, negative and test set of drugs. The positive set consisted of drugs that are approved or under investigation for Hyperlipidemia or associated diseases, whereas the negative set consisted of drugs that increases the lipid level as a side effect. The positive set of drugs was retrieved from Drug Bank, while the negative set drugs was obtained through literature search. Finally, the predicted set consisted of all the approved drugs that were associated with driver nodes. For each set, 1445 molecular descriptors such as atom count, bond count, carbon types, hydrogen bond acceptor count, hydrogen bond donor count etc. were calculated.

With this dataset, we trained and applied a random forest classifier to narrow down the number of potential drugs that can be repurposed for Hyperlipidemia. The model was further subjected to five-fold cross-validation to check the accuracy of its prediction. The candidates having prediction score ≥ 0.65 as assigned by random forest classifier were selected. Next, we searched the literature to see if any direct or indirect role of the selected drugs in lipid lowering or lipid metabolism had been reported.

Our positive, negative and test set consisted of 50 lipid-lowering, 84 lipid-raising and 130 approved drugs respectively. The model showed an average accuracy of 76.8 % during 5-fold cross validation. Further, the precision and recall for the positive predictions were 0.92 and 0.72 respectively. The area under Receiver Operating Characteristic (ROC) curve was 0.79 ± 0.06, that our method could recognise the drugs and separate them from the non-drugs.

Novel drugs to be repurposed

Based on our integrated approach, nine drugs were predicted that can be repurposed for Hyperlipidemia and its associated diseases. These included:

  1. Nedocromil, a medication for asthmathat also exhibits anti-inflammatory properties. Mast cell stabilizers like Nedocromil can be repurposed for treatment of HL, CVD and for toxic effects of hyperlipidemia in the brain.
  2. Drugs for hypothyroidism such as Levothyroxine, Dextrothyroxine and Liothyronine were also proposed. These are known to reduce serum level of total cholesterol, LDL-c, phospholipids, triglycerides and increases HDL-c in hypothyroid patients. These drugsmay be used for HL and its associated conditions in the blood.
  3. Sucralfate, a medicine for peptic ulcers that works on bile acidsto reduce the serum cholesterol level. Repurposing of sucralfate is potentially useful for treatment of HL and its associated cardiovascular problems.
  4. Adenosine (WS070117), decreases serum lipid levels in animal models. Adenosine repurposing is suggested for HL with hypertension and asthma.
  5. Arginine and L-Citrulline are amino acids that have shown improvement in serum lipid profile in high fat diet fed rats. As L-Arginine causes dilation of the blood vessels, it is likely to be useful in acute myocardial infarction patients. It also lowers cholesterol level and improves the ‘good’ cholesterol, HDL-c level. L-Citrulline works to prevent the arteries from becoming narrowed due to high fat, or atherogenesis.
  6. Triflusal is a derivative of salicylic acid. It is a platelet aggregation inhibitory agent and also possesses neuroprotective properties. Trifusal may be repurposed for long-term inflammatory disorders, including psoriasis.

Overall, through our work, we have shown the systems biology and machine learning logic that can help in understanding and treatingof complex lifestyle diseases.

Note : This article is based on our recentwork published in 2021 (reference 4). The complete text can be found here:


  1. Craig J. (2008) Complex diseases: Research and applications. Nature Education 1(1):184.
  2. Mitchell KJ. (2012) What is complex about complex disorders? Genome Biology 13(1): 237.
  3. Radenkovic D, Chawla S, Pirro M, Sahebkar A and Banach M. (2020) Cholesterol in Relation to COVID-19: Should We Care about It? Journal of Clinical Medicine 9(6): 1909.
  4. Rai S, Bhatia V and Bhatnagar S. (2021) Drug repurposing for hyperlipidemia associated disorders: An integrative network biology and machine learning approach. Computational Biology and Chemistry 92: 107505.
Sneha Rai

Sneha Rai completed her Ph.D. from Computational and Structural Biology Lab., Department of Biological Sciences and Engineering at Netaji Subhas Institute of Technology (DU). She did masters in Bioinformatics from Banaras Hindu University and holds six years of bioinformatics research experience. Her area of interest includes structural bioinformatics, systems biology, protein/gene network analysis, data mining, computer aided drug designing, computational genomics, NGS data analysis, variant analysis and their clinical interpretation.

Venugopal Bhatia

Venugopal Bhatia graduated from NSIT in 2019 with a B.E. in Biotechnology. During his undergraduate studies he was involved as an undergraduate researcher at the Computational and Structural Biology Laboratory and worked on two projects which were later published in Nature Scientific Reports and Elsevier Computational Biology and Chemistry. After a brief stint in industry, he's now pursuing his Master's at Yale.

Sonika Bhatnagar

Sonika Bhatnagar, Ph.D. (Biophysics), AIIMS works in the broad area of computational molecular biology. Her laboratory uses techniques of drug target selection, drug design & discovery, machine learning, and integrative genomics & proteomics towards Cardiovascular disease drug design and infectious disease therapeutics.

magazine-slider-imageHexagon - Expert Insights WebinarMFA + MMA 20244th Annual Cleaning Validation 20242nd Annual Pharma Impurity Conclave 2024CPHI Korea 2024CHEMICAL INDONESIA 2024World Orphan Drug Congress Europe 2024INALAB 2024Thermo Fisher - Drug Discovery and the impact of mAbsAdvanced Therapies USA 2024ISPE Singapore Affiliate Conference & Exhibition 20242024 PDA Aseptic Manufacturing Excellence Conference2024 PDA Aseptic Processing of Biopharmaceuticals Conference