The Role of Bioinformatics in Diabetes Drug Development-and Precision Medicine

Evidence-Based Diabetes Management, May 2014, Volume 20, Issue SP8

According to the 2011 National Diabetes Fact Sheet, diabetes affects nearly 26 million Americans, 95% of whom suffer from type 2 diabetes mellitus (T2DM).1 A 2014 report published by the Pharmaceutical Research and Manufacturers of America (PhRMA) documented the development of 180 medications to treat diabetes or diabetes-related conditions, a majority of which are to treat T2DM.2

The drugs being developed are intended to improve on the current therapies to combat the health toll and the healthcare costs associated with the disease. Among the drugs under development is a human peptide, a bioactive part of a gene that regenerates pancreatic islets; additionally, there are novel inhibitors of the protein dipeptidyl peptidase-4 (DPP-4) being developed, as well as a drug that targets sorbitol, a sugar alcohol determined to be responsible for diabetic neuropathy.2 These breakthrough advances are based on the research conducted by scientists to understand disease mechanisms, which include gene sequencing and protein structure elucidation.

GenBank, an all-inclusive, open-source database initiated by the National Center for Biotechnology Information (NCBI), has a very important role to play in this process. GenBank includes nucleotide sequences for more than 280,000 species and the supporting bibliographies, with submissions from individual laboratories as well as large-scale sequencing projects. Additionally, sequences from issued patents are submitted by the US Patent and Trademark Office.3 Despite the open access to this database, researchers all over the world have actively contributed to building up the resource, realizing the vast potential of this knowledge-sharing database. The information either goes to GenBank or is submitted through its European counterpart, the European Bioinformatics Institute (EBI), or its Japanese counterpart, the DNA Data Bank of Japan (DDJB).4

All the leading journals need researchers to submit their sequences to GenBank and cite the corresponding access number in the published article. The new sequences can be directly submitted to EBI, DDJB, or GenBank, and the 3 databases are synchronized daily for easy access to all the information on all 3 databases. The data are virtually in real time, with minimal delay in access to the latest data, free of cost.

Other commonly used nucleotide databases include the European Molecular Biology Laboratory (EMBL; EBI is run by EMBL), SwissProt, PROSITE, and Human Genome Database (GDB).5 Taken together, these databases are essentially a bioinformatics tool that helps integrate biological information with computational software. The information gained can be applied to understand disease etiology (in terms of mutations in genes and proteins) and individual variables, and ultimately aid drug development.

According to the National Institutes of Health Biomedical Information Science and Technology Initiative, bioinformatics is defined as “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral, or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”6

Development of GenBank

Initially called the Los Alamos Sequence Database, this resource was conceptualized in 1979 by Walter Goad, a nuclear physicist and a pioneer in bioinformatics at Los Alamos National Laboratory (LANL).7 GenBank followed in 1982 with funding from the National Institutes of Health, the National Science Foundation, and the Departments of Energy and Defense. LANL collaborated with various bioinformatics and technology companies for sequence data management and to promote open access communications. By 1992, GenBank transitioned to being managed by the National Center for Biotechnology information (NCBI).8

Submissions to the database include original mRNA sequences, prokaryotic and eukaryotic genes, rRNA, viral sequences, transposons, microsatellite sequences, pseudogenes, cloning vectors, noncoding RNAs, and microbial genome sequences. Following a submission (using the Web-based BankIt or Sequin programs), the GenBank staff reviews the documents for originality and then assigns an accession number to the sequence, followed by quality assurance checks (vector contamination, adequate translation of coding regions, correct taxonomy, correct bibliographic citation) and release to the public database.3,8

How Are Researchers Utilizing This Database?

BLAST (Basic Local Alignment Search Tool) software, a product of GenBank, allows for querying sequence similarities by directly entering their sequence of interest, without the need for the gene name or its synonyms.4 An orphan (unknown) or de novo nucleotide sequence, which may have been cloned in a laboratory, can gain perspective following a BLAST search and a match with another, better-characterized sequence in the database. Further, by adding restrictions to the BLAST search, only specific regions of the genome (such as gene-coding regions) can be examined instead of the 3 billion bases.4 BLAST can also translate a DNA sequence to a protein, which can then be used to search a protein database.

BLAST, which was developed at NCBI, works only with big chunks of nucleotide sequences, and not with shorter reads, according to Santosh Mishra, PhD, director of bioinformatics and codirector of the Collaborative Genomics Center at the Vaccine and Gene Therapy Institute (VGTI) of Florida. Mishra, who worked as a postdoctoral research associate with Goad at LANL, was actively involved in developing GenBank. His work contributed to the generation of the “flat file” format, and he also worked on improving the query-response time of the search engine.

Additionally, he initiated the “feature table” in GenBank—the documentation within that helps GenBank, EMBL, and DDJB exchange data on a daily basis. According to Mishra, the STAR aligner, developed at Cold Spring Harbor, works better with reference sequences, while Trinity, developed at the Broad Institute in Cambridge, Massachusetts, is useful for de novo sequences. (The Broad Institute made news last month with its work on identifying gene mutations that prevent diabetes in adults who have known risk factors, such as obesity.)

Advantages and Disadvantages of the GenBank Platform

The biggest single advantage of GenBank is the open-access format, which allows for a centralized repository in a uniform format. The tremendous amount of data generated by laboratories (such as from microarrays and microRNA arrays) cannot be published in a research article. However, the data, tagged and uploaded on GenBank, can be linked to the journals’ websites and the links can be provided in the print versions of the articles as well.4

On the flip side, the biggest advantage of being an open-access platform is also the biggest disadvantage of the software. There’s always the probability of scientists registering faulty genetic sequences on the website, which will not be caught unless they are peer reviewed.

Despite the incorporation of several quality control mechanisms into the system, reuse of the data by other scientists alone can help discover glitches in the existing data. Additionally, GenBank encourages its users to submit feedback and update records, which unfortunately is not a very proactive process.4

Bioinformatics and Pharmacogenomics in Drug Discovery/Development

Accelerating the drug development process saves costs for the pharmaceutical industry, especially with the way the industry functions today. The company that discovers or invents a new chemical entity, which could metamorphose into a new drug candidate, can squeeze the maximum profit out of the drug before the patent expires and competitors catch on. Essentially, companies jump at every opportunity to accelerate any aspect of the discovery/development process. Resources like the GenBank and EBI are data mines that can speed up the entire process in the following ways:

Target identification: Drug candidates can be identified (following a highthroughput screen of chemical libraries) and developed only after a “druggable target” is discovered for a disease condition. Typically, about 1 in 1000 synthesized compounds will progress to the clinic, and only 1 in 10 drugs undergoing clinical trials reaches the market.9

Optimizing/validating a target is essential due to the prohibitively high cost of conducting trials, and the potential targets for drug discovery are increasing exponentially.10 By mining and storing

information from huge data sets, like the human genome sequence, the nucleotide sequence of the target proteins has become readily available, as has the potential to identify new targets. This can exponentially increase the content of the drug pipelines of pharmaceutical companies.10

According to Arathi Krishnakumar, PhD, a protein biochemist and a senior research investigator with the department of Exploratory Biology and Genomics, Bristol-Myers Squibb (BMS), “For compounds that have no obvious targets from a typical phenotypic screening, proteomics offers tools for target identification or target deconvolution. Monitoring the global phosphorylation

status of proteins that are downstream of tyrosine kinase inhibitors—also termed phosphoproteomics—is a very attractive tool that can also be used for target as well as biomarker identification. These events can be used as reporters (biomarkers) for specific upstream kinase(s).”

The previous issue of Evidence-Based Diabetes Management reported on the identification of a mutation in the gene SLC30A8 that protected individuals from developing T2DM. The mutation was identified by genetic tests conducted in more than 150,000 individuals, a multi-collaborative effort between the Broad Institute, Massachusetts General Hospital, Pfizer, and Amgen.11

Target validation: Establishing a robust association between a likely target and the disease, to confirm that target modulation translates into a beneficial therapeutic outcome, would not only validate the drug development process but also help absorb the risks associated with clinical trial failure of the molecule being developed.10

Says Krishnakumar, “Target validation is typically done with knock-out or knock-down of the proposed target using RNAi and then monitoring the disease phenotype in relevant cellular models. Proteomics tools are also highly valuable in monitoring specific events on proteins like post translational modifications, including phosphorylation, methylation, oxidation, etc, new product

generation, degradation products, protein-protein interaction, etc, all of which could be direct or indirect consequences of target activation or engagement.”

Cost reduction: The drug development process is not just lengthy (product development can take 10 to 15 years9), but prohibitively expensive as well. Averaging $140 million in the 1970s, the cost of developing a drug was estimated at a whopping $1.2 billion in the early 2000s,12 and a recent Forbes analysis estimated the cost at $5 billion.13

Worth noting is that the final cost of any drug, which includes the total costs from discovery to approval, includes the cost of absorbing all the clinical trial failures.10 Clearly, bioinformatics tools improve the efficiency of target discovery and validation processes, reduce the time spent on the discovery phase, and make the entire process more cost-effective.

Mishra believes GenBank is a good starting point in the drug discovery process. When a new sequence (of known or unknown function) is identified/isolated in the laboratory, a GenBank search will help identify homologues (human or in other organisms) with a 70% to 80% match. Functional studies would then ensue, along with cell and tissue distribution studies.

Industry Partnerships

With the value of personalized medication gaining acceptance, the study of pharmacogenomics (genetic variants that determine a person’s drug response; one size does not fit all) is extremely helpful to tailor the optimal drug, dose, and treatment options for a patient to improve efficacy as well as avoid adverse events (AEs).10 According to the Agency for Healthcare Research and Quality of the HHS, AEs annually result in more than 770,000 injuries and deaths and may cost up to $5.6 million per hospital.14

To this end, EMBL-EBI is actively involved in industry partnerships (the partnerships were initiated in 1996), which include Astellas, Merck Serono, AstraZeneca, Novartis, GlaxoSmithKline, BMS, and several others.15 With the high-throughput data that research and development (R&D) activities generate, open-source software and informatics developed by organizations like the GenBank and EBI could greatly improve efficiency and reduce the cost of drug discovery and development.

Translational Bioinformatics and Precision Medicine

Healthcare today is primarily symptom driven, and intervention usually occurs late in the pathological process, when the treatment may not be as effective. Identifying predisease states that could provide a window into the forthcoming risk of developing a disease, identifying reliable markers, and developing useful therapies would be the key to managing disease treatment16—not just to improve efficiency but also to reduce healthcare costs, which it is estimated will steadily increase and by 2022 account for 19.9% of the gross domestic product (GDP).17

With precision medicine or personalized medicine, molecular profiles generated from a patient’s genomic (coupled with other “-omics” such as epigenomics, proteomics, and metabolomics) information could help accurately drive the diagnostic, prognostic, and therapeutic plans, tailored to the patient’s physiological status. Predictive models can also be developed for different biological contexts, such as disease, populations, and tissues.15 However, the deluge of data generated by bioinformatics tools requires a framework to regulate, compile, and interpret the information.

Most importantly, the key stakeholders (government, research industry, biological community, pharmaceutical industry, insurance companies, patient groups, and regulatory bodies17) that would drive the widespread acceptance and implementation of precision medicine need to be brought up to speed with the enormous progress made in the field and the promise it brings. There would also be a revolutionary change in the approach to conducting clinical trials—the phase 3 studies conducted in the target population could focus on a more select patient group, which could improve both clinical and economic efficacy.18

At BMS, Krishnakumar’s group actively provides support to clinical trials by developing assays for clinical samples. When it comes to administration of biologics such as antibodies, individual variations such as expression levels of various proteins and their affinity for an antibody necessitate dose-titration in order to personalize treatment to improve efficacy. The developing field of translational bioinformatics creates a platform to bring all the data together, which can then be used to generate a treatment plan personalized to a patient (Figure 1). It has been defined as “the development of storage, analytic, and interpretive methods to optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventative, and participatory health.”15

The primary goal of translational bioinformatics is to connect the dots and develop disease networks that can be used as predictive models. In other words, harmonization of the data from different sources (genome, proteome, transcriptome, metabolome, and patient’s pathological data) could help in making better-informed treatment decisions. Within medical R&D, a commonly held belief is that cures for diseases could be found residing within existing data, if only the data could be made to give up their secrets.19 The current status of the scientific, medical, and healthcare fields is that experts in each field have set their minds on developing the best technologies; unfortunately, the technologies are compartmentalized and they work in parallel.

The great need, which has been recognized and implemented in limited areas, is to create platforms where the data can be merged to produce meaningful outcomes.

Data Integration Platforms to Boost Evidence-Based Decisions

Implementing these huge changes would necessitate that physicians and providers be more adept at interpreting molecular data, which essentially requires improved education models that include relevant courses during graduate training. Also, development of software that can interpret the data would provide a tremendous advantage to researchers, clinicians, scientists, pathologists, and maybe patients as well.

An application developed by Remedy Informatics, TIMe, boosts the process further. TIMe merges data, registries, applications, analyses, and any other relevant content. TIMe promises to enable faster, more informed decisions in clinical practice, research, and business operations. It also is expected to improve treatment effectiveness, quality of care, and patient outcomes.20

In Europe, a collaborative project (DIRECT) was initiated in 2012 by 4 pharmaceutical companies and 21 academic institutions, with the objective of stratifying diabetic patients based on biomarker identification, which would allow targeted intervention, monitor treatment response, conduct stratified trials, and identify nonresponders or those who might be intolerant to treatment.21 The pharmaceutical companies participating in this effort include sanofi-aventis, Eli Lilly, Servier, and Novo Nordisk.

DIRECT, in turn, constitutes 1 of 3 consortia under development by the Innovative Medicines Initiative (IMI). IMI, a partnership between the European Commission and the European Federation of Pharmaceutical Industries and Associations, includes DIRECT, IMIDIA (to slow disease progression by improving beta-cell function), and SUMMIT (developing surrogate markers for late-stage micro- and macro-vascular complications), collaborative efforts aimed to develop novel individualized therapies with improved efficacy and safety.22

Applications of Translational Bioinformatics

Once the genomic and/or proteomic data have been generated, what next? How are providers employing these data to their advantage and to guide treatment? There are several reports on clinical studies that are being successfully conducted on the foundation of precision as well as evidence-based medicine. Researchers at the University of Southampton have developed a blood test that identifies young children at risk for developing obesity. The test, to be conducted in children as young as 5 years, differentiates between those with high body fat and those with low body fat when they grow older. This test, which identifies epigenetic changes (DNA methylation), showed that a 10% increase in DNA methylation at age 5 years translates into 12% more body fat by the time the children are 14 years old, independent of gender, physical activity, or their timing of puberty.

The principal investigators on the study believe that identifying at-risk children could help make lifestyle modifications early on to help disease management.23 Genetic analysis using genotyping and sequencing techniques in a family of type 1 diabetes mellitus (T1DM) patients resulted in the identification of a mutation in the SIRT1 gene, which produces a defunct SIRT-L107P protein.

Essentially, the study identified that the mutant protein was responsible for an autoimmune disorder in the family: 4 members suffered from T1DM and 1 developed ulcerative colitis. Overexpression of SIRT-L107P in beta cells in vitro resulted in an increased expression of nitric oxide, the cytokine TNF-α, and the chemokine KC, compared with the controls. Additionally, SIRT1 knockout mice were more susceptible to islet destruction and hyperglycemia following induction of pancreatic insulitis.24 The study provides a foundation for the application of SIRT1 activators, already under development for aging and other metabolic disorders, in T1DM therapy.

Bioinformatics studies have also yielded microRNAs, which are small (~22 nucleotides), noncoding RNA molecules that can repress the transcription of messenger RNA (mRNA) or promote its degradation, thereby silencing gene expression.25 Initially thought of as “junk” sequences on the DNA since they are non-coding nucleotides, miRNAs (about 24,521 listed in miRBase, a database maintained by the University of Manchester26) have now found their place in clinical trials as biomarkers (cancer,27 multiple sclerosis,28 psoriasis29) and are also being developed as “drugs” by companies like Mirna Therapeutics Inc.30

The “Adaptive” Clinical Trial Design

The ‘omic’ revolution has also had a tremendous impact on clinical trial design. The FDA definition of an adaptive clinical study is “a study that includes a prospectively planned opportunity for modification of one or more specified aspects of the study design and hypotheses based on analysis of data from subjects in the study.”31 The trial design includes interim analysis points that would allow researchers to alter the trial (treatment dose or schedule, randomization) based on results from earlier study participants. Two of the 20 ongoing adaptive trials recently published positive results.

The adaptive design was implemented in a phase 2/3 study of dulaglutide, a once-weekly glucagon-like peptide analogue being developed for T2DM. Stage 1 of the trial included an adaptive dosefinding design that could lead to dose selection or early termination due to futility. The trial was expected, should the dose selection be achieved, to enter the second stage to continue evaluation of the selected doses. Completion of the 2 stages was expected to serve as a confirmatory phase 3 trial.32

A software company, Aptiv Solutions, has developed 2 different softwares: FACTS for the design and simulation of early-phase adaptive clinical trials, and ADDPLAN DF for early-phase dose-finding studies.33

Genetic Testing to Determine Disease Susceptibility

Genetic testing for T1DM and T2DM is possible. Expression of HLA-DR3 or HLADR4 in Caucasians, HLA-DR7 in African Americans, and HLA-DR9 in Japanese have been identified as markers for increased susceptibility to T1DM.34 Recent studies have also identified a mutation in SIRT1 in T1DM patients, as referred to earlier.24 Several T2DM susceptibility genes have been identified, including PPARγ, ABCC8, KCNJ11, and CALPN10, while maturity-onset diabetes of the young has been associated with a host of other genes.35

Then you have J. Craig Venter, PhD, a biologist and entrepreneur, who competed with the Human Genome Project to sequence the human genome and who recently announced the launch of a new company, Human Longevity. The company plans to sequence 40,000 human genomes per year to gain insights into the molecular causes of aging and age-associated diseases such as cancer and heart disease.36

The Healthcare Equation

Insurance companies are rapidly adapting to this changing scene of “big data” in their own right. Back in 2011, Aetna announced a partnership with the Center for Biomedical Informatics at Harvard Medical School with the aim of improving the quality and affordability of healthcare (healthcare informatics).

The researchers at Harvard aimed to:

• Evaluate the outcomes of various treatments for specific conditions based on quality and cost

• Determine factors that predict adherence for chronic diseases • Study how claims data and clinical data, available through electronic health records, can best be used to predict outcomes

• Improve the ability to predict adverse events through a proactive study of claims and clinical data.37

The possibilities are enormous, with application in all disease fields. Translational bioinformatics integrates the various data sources and paves a path for precision medicine that would be immensely valuable to all stakeholders (patients, pharmaceutical companies, scientists, and physicians) alike.References

1. Statistics about diabetes. American Diabetes Association website. Accessed April 28, 2014.

2. Medicines in development: diabetes. PhRMA website. Accessed April 28, 2014.

3. Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2014;42:D32-D37.

4. McEntyre J, Lipman J. GenBank - a model community resource? Nature. Published 2007. Accessed February 27, 2013.

5. Bioinformaticsweb website. Accessed March 5, 2014.

6. NIH working definition of bioinformatics and computational biology. National Institutes of Health website. Accessed March 5, 2014.

7. Walter Goad papers. American Philosophical Society website. Accessed March 4, 2014.

8. The GenBank submissions handbook. NCBI website. Accessed March 4, 2014.

9. Tamimi NA, Ellis P. Drug development: from concept to marketing! Nephron Clin Pract. 2009;113(3):c125-c131.

10. Katara P. Role of bioinformatics and pharmacogenomics in drug discovery and development process. Netw Model Anal Health Inform Bioinforma. 2013;2:225-230.

11. Caffrey MK. Study on protective gene mutations for T2DM appears to have strong implications for drug development. Evidence-Based Diabetes Management. 2014;20(SP4):SP102.

12. 2013 profile: pharmaceutical research industry. PhRMA website. Accessed March 6, 2014.

13. Herper M. The cost of creating a new drug now $5 billion, pushing big pharma to change.Forbes website. Published August 11, 2013. Accessed March 6, 2014.

14. Reducing and preventing adverse drug events to decrease hospital costs. Agency for Healthcare Research and Quality website. Accessed March 10, 2014.

15. Industry partnerships. The European Bioinformatics Institute website. Accessed March 5, 2014.

16. Readhead B, Dudley J. Translational bioinformatics approaches to drug development. Adv Wound Care (New Rochelle). 2013;2(9):470-489.

17. National health expenditure projections 2012-2022. CMS website.

downloads/proj2012.pdf. Accessed March 10, 2014.

18. Mirnezami R, Nicholson J, Darzi A. Preparing for precision medicine. NEJM. 2012;366(6):489-491.

19. Kennedy GD. Enterprise informatics: key to precision medicine, scientific breakthroughs, and competitive advantage. Remedy Informatics; 2013.

20. TIMe™ — the informatics marketplace. Remedy Informatics website. March 14, 2014.

21. DIRECT - innovative medicines initiative: DIabetes REsearCh on patient straTification. DIRECT website. Accessed April 28, 2014.

22. The IMI diabetes platform. IMI website. Accessed April 28, 2014.

23. Blood test may help predict whether a child will become obese. ScienceDaily website. Published March 25, 2014. Accessed April 15, 2014.

24. Biason-Lauber A, Böni-Schnetzler M, Hubbard BP, et al. Identification of a SIRT1 mutation in a family with type 1 diabetes. Cell Metab.2013;17(3):448-455.

25. Cortez MA1, Ivan C, Zhou P, Wu X, Ivan M, Calin GA. microRNAs in cancer: from bench to bedside. Adv Cancer Res. 2010;108:113-157.

26. miRBase: the microRNA database. miRBase. Accessed March 14, 2014.

27. The role of microRNA-29b in the oral squamous cell carcinoma. =microRNA&rank=1. Accessed March 14. 2014.

28. A pilot study to assess microRNA biomarkers in early and later stage multiple sclerosis.

erosis&rank=1. Accessed March 14, 2014.

29. miRNAs and mRNAs in Psoriasis. T01604902?term=miRNA+psoriasis&rank=1.Accessed March 14. 2014.

30. Bader AG, Lammers P. The therapeutic potential of microRNAs. Innov Pharm Technol. Published March 2011. Accessed March 14, 2014.

31. Guidance for industry: adaptive design clinical trials for drugs and biologics. FDA website. Published February 2010. Accessed March 27, 2013.

32. Geiger MJ, Skrivanek Z, Gaydos B, et al. An adaptive, dose-finding, seamless phase 2/3 study of a long-acting glucagon-like peptide-1 analog (dulaglutide): trial design and baseline characteristics. J Diabetes Sci Technol. 2012;6(6):1319—1327.

33. Design & simulation software. Aptiv Solutions website. April 28, 2014

34. Genetics of diabetes. American Diabetes Association website. Accessed April 28, 2014.

35. Genetics and diabetes. World Health Organization website. Accessed April 28, 2014.

36. Pollack A. A genetic entrepreneur sets his sights on aging and death. The New York Times. March 5, 2014:B1.

37. Aetna and the center for biomedical informatics at Harvard Medical School announce research collaboration [press release]. Hartford, CT: Aetna; November 17, 2011. Accessed March 14, 2014.