Currently Viewing:
Evidence-Based Oncology May 2014
Everolimus in Elderly Hormone-Receptor-Positive Advanced Breast Cancer Patients
Jaqueline Rogerio, MD
Stem Cells Create a Therapeutic Niche
Surabhi Dangi-Garimella, PhD
CVS Caremark Quits for Good: Our Decision to Stop Selling Tobacco Products
Troyen A. Brennan, MD, MPH, Chief Medical Officer, CVS Caremark
The Double Whammy of the Obesity Epidemic: Increased Susceptibility to Cancer
Surabhi Dangi-Garimella, PhD
Cheap and Easy-To-Use Diagnostic Tests to Detect Disease Biomarkers, Including Cancer
Surabhi Dangi-Garimella, PhD
The Big Data Revolution: From Drug Development to Better Health Outcomes?
Andrew Smith
Roche Molecular Diagnostic's cobas HPV Test Approved by the FDA
Surabhi Dangi-Garimella, PhD
NCCN Panel Asks What ACA Means to Cancer Care Delivery
Mary K. Caffrey
Fast Track for Prima Biomed's CVac Clinical Trial Development Program
Surabhi Dangi-Garimella, PhD
Understanding Which Therapy Comes First in Treating Castration-Resistant Prostate Cancer
Mary K. Caffrey
Choosing a BRCA Genetic Testing Laboratory: A Patient-Centric and Ethical Call to Action for Clinicians and Payers
Ellen T. Matloff, MS, Rachel E. Barnett, MS, and Robert Nussbaum, MD
New NCCN Prostate Cancer Screening Guidelines Aim for Middle Ground
Mary K. Caffrey
Denlinger Discusses Posttreatment Surveillance for Cancer Survivors
Mary K. Caffrey
More Enthusiasm for Newer Melanoma Therapies
Mary K. Caffrey
When Science Outpaces Payers: Reimbursement in Molecular Diagnostics
Mary K. Caffrey
Mutations That Drive Lung Cancer Also Driving Frontiers of Treatment
Mary K. Caffrey
Promising News in Treating Multiple Myeloma
Mary K. Caffrey
Sorting Through Screening Protocols for Colorectal Cancer
Mary K. Caffrey
Protecting Bone Health During Cancer Care
Mary K. Caffrey
Currently Reading
The Role of Bioinformatics in Oncology Drug Development-and Precision Medicine
Surabhi Dangi-Garimella, PhD

The Role of Bioinformatics in Oncology Drug Development-and Precision Medicine

Surabhi Dangi-Garimella, PhD
Oncology drug development, a burgeoning therapeutic field for pharmaceutical companies, is also extremely time consuming and expensive. Navigating a single drug moiety through the tedious process of preclinical studies, clinical trials, and of course the FDA’s approval process is a net investment of 12 to 15 years and over a billion dollars.1 This, added to the failure rates of clinical trials (5 of the top 10 clinical trial failures in 2013 were of drugs for cancer indications2) makes it imperative that the discovery and development process be streamlined to be cost-effective and timely.

GenBank, an all-inclusive, open-source database initiated by the National Center for Biotechnology Information (NCBI), has a very important role to play in this process. GenBank includes nucleotide sequences for more than 280,000 species and the supporting bibliographies, with submissions from individual laboratories as well as large-scale sequencing projects. Addi-tionally, sequences from issued patents are submitted by the US Patent and Trademark Office.3 Despite the open access to this database, researchers all over the world have actively contributed to building up the resource, realizing the vast potential of this knowledge-sharing database. The information either goes to GenBank or is submitted through its European counterpart, the European Bioinformatics Institute (EBI), or its Japanese counterpart, the DNA Data Bank of Japan (DDJB).4 All the leading journals need researchers to submit their sequences to GenBank and cite the corresponding access number in the published article. The new sequences can be directly submitted to EBI, DDJB, or GenBank, and the 3 databases are synchronized daily for easy access to all the information on all 3 databases. The data are virtually in real time, with minimal delay in access to the latest data, free of cost.

Other commonly used nucleotide databases include the European Molecular Biology Laboratory (EMBL; EBI is run by EMBL), SwissProt, PROSITE, and Human Genome Database (GDB).5 Taken together, these databases are essentially a bioinformatics tool that helps integrate biological information with computational software. The information gained can be applied to understand disease etiology (in terms of mutations in genes and proteins) and individual variables, and ultimately aid drug development.

According to the National Institutes of Health Biomedical Information Science and Technology Initiative, bioinformatics is defined as “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral, or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”6

Development of GenBank

Initially called the Los Alamos Sequence Database, this resource was conceptualized in 1979 by Walter Goad, a nuclear physicist and a pioneer in bioinformatics at Los Alamos National Laboratory (LANL).7 GenBank followed in 1982 with funding from the National Institutes of Health, the National Science Foundation, and the Departments of Energy and Defense. LANL collaborated with various bioinformatics and technology companies for sequence data management and to promote open access communications. By 1992, GenBank transitioned to being managed by the National Center for Biotechnology information (NCBI).8

Submissions to the database include original mRNA sequences, prokaryotic and eukaryotic genes, rRNA, viral sequences, transposons, microsatellite sequences, pseudogenes, cloning vectors, noncoding RNAs, and microbial genome sequences. Following a submission (using the Web-based BankIt or Sequin programs), the GenBank staff reviews the documents for originality and then assigns an accession number to the sequence, followed by quality assurance checks (vector contamination, adequate translation of coding regions, correct taxonomy, correct bibliographic citation) and release to the public database.3,8

How Are Researchers Utilizing This Database?

BLAST (Basic Local Alignment Search Tool) software, a product of GenBank, allows for querying sequence similarities by directly entering their sequence of interest, without the need for the gene name or its synonyms.4 An orphan (unknown) or de novo nucleotide sequence, which may have been cloned in a laboratory, can gain perspective following a BLAST search and a match with another, better-characterized sequence in the database. Further, by adding restrictions to the BLAST search, only specific regions of the genome (such as gene-coding regions) can be examined instead of the 3 billion bases.4 BLAST can also translate a DNA sequence to a protein, which can then be used to search a protein database.

BLAST, which was developed at NCBI, works only with big chunks of nucleotide sequences, and not with shorter reads, according to Santosh Mishra, PhD, director of bioinformatics and codirector of the Collaborative Genomics Center at the Vaccine and Gene Therapy Institute (VGTI) of Florida. Mishra, who worked as a postdoctoral research associate with Goad at LANL, was actively involved in developing GenBank. His work contributed to the generation of the “flat file” format, and he also worked on improving the query-response time of the search engine. Additionally, he initiated the “feature table” in GenBank—the documentation within that helps GenBank, EMBL, and DDJB exchange data on a daily basis.

According to Mishra, the STAR aligner, developed at Cold Spring Harbor, works better with reference sequences, while Trinity, developed at the Broad Institute in Cambridge, Massachusetts, is useful for de novo sequences. (The Broad Institute made news last month with its work on identifying gene mutations that prevent diabetes in adults who have known risk factors, such as obesity.)

Advantages and Disadvantages of the GenBank Platform

The biggest single advantage of GenBank is the open-access format, which allows for a centralized repository in a uniform format. The tremendous amount of data generated by laboratories (such as from microarrays and microRNA arrays) cannot be published in a research article. However, the data, tagged and uploaded on GenBank, can be linked to the journals’ websites and the links can be provided in the print versions of the articles as well.4

On the flip side, the biggest advantage of being an open-access platform is also the biggest disadvantage of the software. There’s always the probability of scientists registering faulty genetic sequences on the website, which will not be caught unless they are peer reviewed. Despite the incorporation of several quality control mechanisms into the system, reuse of the data by other scientists alone can help discover glitches in the existing data. Additionally, GenBank encourages its users to submit feedback and update records, which unfortunately is not a very proactive process.4

Bioinformatics and Pharmacogenomics in Drug Discovery/Development

Accelerating the drug development process saves costs for the pharmaceutical industry, especially with the way the industry functions today. The company that discovers or invents a new chemical entity, which could metamorphose into a new drug candidate, can squeeze the maximum profit out of the drug before the patent expires and competitors catch on. Essentially, companies jump at every opportunity to accelerate any aspect of the discovery/development process. Resources like the GenBank and EBI are data mines that can speed up the entire process in the following ways:

Target identification

  Drug candidates can be identified (following a high-throughput screen of chemical libraries) and developed only after a “druggable target” is discovered for a disease condition. Typically, about 1 in 1000 synthesized compounds will progress to the clinic, and only 1 in 10 drugs undergoing clinical trials reaches the market.9 Optimizing/validating a target is essential due to the prohibitively high cost of conducting trials, and the potential targets for drug discovery are increasing exponentially.10 By mining and storing information from huge data sets, like the human genome sequence, the nucleotide sequence of the target proteins has become readily available, as has the potential to identify new targets. This can exponentially increase the content of the drug pipelines of pharmaceutical companies.10

According to Arathi Krishnakumar, PhD, a protein biochemist and a senior research investigator with the department of Exploratory Biology and Genomics, Bristol-Myers Squibb (BMS), “For compounds that have no obvious targets from a typical phenotypic screening, proteomics offers tools for target identification or target deconvolution. Monitoring the global phosphorylation status of proteins that are downstream of tyrosine kinase inhibitors—also termed phosphoproteomics—is a very attractive tool that can also be used for target as well as biomarker identification. These events can be used as reporters (biomarkers) for specific upstream kinase(s).”

Target validation

Establishing a robust association between a likely target and the disease, to confirm that target modulation translates into a beneficial therapeutic outcome, would not only validate the drug development process but also help absorb the risks associated with clinical trial failure of the molecule being developed.10

Says Krishnakumar, “Target validation is typically done with knock-out or knock-down of the proposed target using RNAi and then monitoring the disease phenotype in relevant cellular models. Proteomics tools are also highly valuable in monitoring specific events on proteins like post translational modifications, including phosphorylation, methylation, oxidation, etc, new product generation, degradation products, protein-protein interaction, etc, all of which could be direct or indirect consequences of target activation or engagement.”

Cost reduction

The drug development process is not just lengthy (product development can take 10 to 15 years9), but is prohibitively expensive as well. Averaging $140 million in the 1970s, the cost of developing a drug was estimated at a whopping $1.2 billion in the early 2000s,11 and a recent Forbes analysis estimated the cost at $5 billion.12

Worth noting is that the final cost of any drug, which includes the total costs from discovery to approval, includes the cost of absorbing all the clinical trial failures.10 Clearly, bioinformatics tools improve the efficiency of target discovery and validation processes, reduce the time spent on the discovery phase, and make the entire process more costeffective.

Mishra believes GenBank is a good starting point in the drug discovery process. When a new sequence (of known or unknown function) is identified/ isolated in the laboratory, a GenBank search will help identify homologues (human or in other organisms) with a 70% to 80% match. Functional studies would then ensue, along with cell and tissue distribution studies.

Industry Partnerships

Copyright AJMC 2006-2017 Clinical Care Targeted Communications Group, LLC. All Rights Reserved.
Welcome the the new and improved, the premier managed market network. Tell us about yourself so that we can serve you better.
Sign Up

Sign In

Not a member? Sign up now!