Research Article
Precision medicineReal-World Evidence In Support Of Precision Medicine: Clinico-Genomic Cancer Data As A Case Study
- Vineeta Agarwala ([email protected]) is a resident in the Department of Medicine, Stanford University, in California, and director of product management at Flatiron Health, in New York City.
- Sean Khozin is associate director (acting) of the Oncology Center of Excellence, Food and Drug Administration, in Silver Spring, Maryland.
- Gaurav Singal is vice president for data strategy and product development, Foundation Medicine, in Cambridge, and a physician in the Department of Medicine, Brigham and Women's Hospital, in Boston, both in Massachusetts.
- Claire O’Connell is a product manager at Flatiron Health.
- Deborah Kuk is a quantitative scientist at Flatiron Health.
- Gerald Li is a data scientist at Foundation Medicine.
- Anala Gossai is a quantitative scientist at Flatiron Health.
- Vincent Miller is chief medical officer at Foundation Medicine.
- Amy P. Abernethy is chief medical officer and chief scientific officer at Flatiron Health.
Abstract
The majority of US adult cancer patients today are diagnosed and treated outside the context of any clinical trial (that is, in the real world). Although these patients are not part of a research study, their clinical data are still recorded. Indeed, data captured in electronic health records form an ever-growing, rich digital repository of longitudinal patient experiences, treatments, and outcomes. Likewise, genomic data from tumor molecular profiling are increasingly guiding oncology care. Linking real-world clinical and genomic data, as well as information from other co-occurring data sets, could create study populations that provide generalizable evidence for precision medicine interventions. However, the infrastructure required to link, ensure quality, and rapidly learn from such composite data is complex. We outline the challenges and describe a novel approach to building a real-world clinico-genomic database of patients with cancer. This work represents a case study in how data collected during routine patient care can inform precision medicine efforts for the population at large. We suggest that health policies can promote innovation by defining appropriate uses of real-world evidence, establishing data standards, and incentivizing data sharing.
Real-world evidence is generated using data derived from the experience of patients outside of conventional clinical trials.1,2 Real-world data can, in principle, capture the experience of the majority of adult oncology patients, as compared to only the 5 percent who have the opportunity to participate in clinical trials. The proliferation and widespread adoption of electronic health records (EHRs), as well as other emerging digital health solutions, have made real-world data an attractive source for clinical and translational research. Availability of these data has also created a dynamic policy landscape, further bolstered by the recent enactment of the 21st Century Cures Act of 2016, which requires the Food and Drug Administration (FDA) to develop guidance for scenarios in which real-world data may inform regulatory decisions (for example, new drug approvals, label expansions, and new indications for existing therapies). Both drug developers and regulators are recognizing that new sources of data (in addition to the results of randomized controlled trials) could play a major role in modernizing drug development.3 A growing body of work demonstrates that real-world evidence may be used to replicate, extend, and supplement data sets from traditional prospective clinical trials. Several studies suggest that real-world data can recapitulate clinical trial findings.4,5 EHR data may provide an important opportunity to characterize outcomes for populations whose members enroll in clinical trials at lower rates (such as elderly patients, members of minority groups, and patients with poor performance status). Real-world data can also help investigators understand the comparative efficacy and cost-effectiveness of multiple therapies used for the same indication and identify late-term adverse effects that may occur following completion of a typical clinical trial.
Additional use cases for real-world evidence include the development of contemporaneous external control arms (the use of routinely captured data to model the standard-of-care arm in clinical trials) and pragmatic clinical trials (randomized trials in which patients are enrolled and provide consent at a routine clinic visit, and outcomes of interest are obtained from routine documentation in EHRs). Finally, real-world data may also support clinical trial design and operational planning for studies, including site selection and patient recruitment (especially for patients with rare cancer types and for understudied populations). These applications are pertinent to personalization of treatment decisions, as they inform how real patients will respond to treatments in typical clinical settings, a fundamental need when applying research evidence to individual patients at the point of care.
For real-world data to be useful, high-quality data sets are needed. Key aspects of data quality include the representation of general populations, documentation of data completeness and reliability, clear provenance to allow each data point to be traced to its source, transparency in study designs and data analysis plans, and up-to-date reflection of contemporary clinical practice.6 Furthermore, to inform the practice of precision medicine, real-world data must include not only clinical but also deep biological and genomic data. In this article we explore the hurdles that currently exist in assembling large-scale real-world clinico-genomic data, and we describe one potential solution.
Real-World Challenges: A Clinical Vignette
In an era of EHRs and advanced genomic data platforms, continuous aggregation and queries of clinical and genomic data from routinely treated cancer patients should logically be attainable. Implementation is challenged, however, by current limitations in EHR design and the lack of interoperability between clinical and molecular information systems.
A clinical vignette, based on a single real-world patient’s oncologic journey, helps illustrate these challenges (exhibit 1). A patient, JD, is newly diagnosed with non-small-cell lung cancer (NSCLC) via a biopsy performed at her local community cancer center. The biopsy results are added to her EHR. Her cancer is at an early stage, so she undergoes surgical resection and adjuvant chemotherapy. Nearly four years later, a mass is detected on a surveillance chest computed tomography (CT) scan, and a biopsy confirms recurrent NSCLC. The scan and a detailed note by JD’s oncologist describing her disease are uploaded into her EHR.
Exhibit 1 Timeline of a deidentified patient’s journey in the clinico-genomic database (CGDB)

Her oncologist then sends the tumor biopsy sample to a commercial diagnostics laboratory, where next-generation sequencing of JD’s tumor is performed across hundreds of cancer-related genes. Along with the biopsy, the oncologist sends demographic information about JD, as well as what tumor type she believes JD has (NSCLC). No other information about the extent or stage of her disease is sent to the lab. The lab sequences the sample and detects a list of genetic alterations. The lab also annotates the actionability of each alteration based on a thorough review of the literature, FDA filings, and large public databases of genomic data (an approach commonly taken by many diagnostic labs across the country).7–9 For some alterations, there is evidence that a particular class of drugs may be efficacious (or that a drug may confer resistance). However, for other alterations—known as variants of unknown significance—no such data exist.
For JD, potentially actionable alterations are reported in the genes KRAS and STK11. Around a dozen variants of unknown significance are also reported to JD’s oncologist. Given the lack of sufficient evidence to inform therapy choice based on these variants, JD’s oncologist first focuses her attention on a few of the actionable alterations—specifically, the short variant in the commonly mutated gene KRAS and the KRAS amplification. Based on its findings, the lab lists treatment with a MEK pathway inhibitor (trametinib; trade name Mekinist) as one possible option. The oncologist also reviews JD’s reported variant in STK11, based on which the lab provides summarized literature showing possible efficacy of mTOR pathway inhibitors such as everolimus (trade name Afinitor) or temsirolimus (trade name Torisel). These drugs have not been tested in lung cancer trials but have been shown to produce response in patients who have STK11 alterations in other tumor types, such as breast cancer. There is no database of patients where an oncologist can look up outcomes following treatment with available therapies for NSCLC patients whose tumors share JD’s mutational profile.
Based largely on her own anecdotal experience with trametinib in her NSCLC patients, JD’s oncologist selects trametinib as a first-line agent. Published evidence on trametinib use in patients with KRAS alterations is mixed, with some studies raising the possibility that patients with KRAS-altered NSCLC may respond to trametinib while others suggest otherwise.10,11 Nonetheless, JD does well on this drug for eight months, at which point disease progression is noted on her scans. Her oncologist treats through this progression based on the best available evidence (her clinical judgment) for another eight months. However, after sixteen months of trametinib therapy, further disease progression is noted, and JD is switched to a standard chemotherapy regimen with carboplatin and paclitaxel. (Her oncologist recommended this choice despite the availability of checkpoint inhibitor therapy, reflecting the reality of real-world variance in treatment patterns). JD receives standard chemotherapy for two months but passes away from disease complications two months after her last dose of therapy.
Therapy choices, medication administrations, responses to therapy, scan results, and the date of JD’s death are all documented in the EHR but are not shared with anyone, including the diagnostics laboratory that initially listed trametinib as a possible treatment option. This is true for many other patients like JD across the US, and around the world. Such breaks in the flow of clinico-genomic information in cancer care delivery are a major barrier to improving personalized treatment recommendations based on actionable genomic alterations.
Aggregating Patient Stories To Learn
There is much to learn from JD’s story, especially if data from her experience could flow easily to be combined with the stories of other patients with similar genomic alterations and therapy regimens. Was it the KRAS short variant that helped JD respond to trametinib for as long as she did? Did the KRAS amplification or the KRAS rearrangement of unknown significance play a role? What other genomic and clinical characteristics contributed to her response? And what were the outcomes of other patients with similar genomic profiles who received different treatment regimens? To answer questions like these, data from large cohorts of similar patients must be shared and aggregated, and nuances of both clinical and genomic findings need to be captured.
Data aggregation is challenging in practice. Clinical and genomic data in oncology today are housed in separate locations. JD’s clinical data, alongside those of thousands of other patients treated at her cancer center, are recorded and stored only in the clinic’s EHR. Similarly, large-scale genomic data aggregated across large numbers of patients undergoing genomic profiling resides in diagnostic lab databases, which are siloed across different commercial and hospital-based laboratories.
Several consortia have launched efforts to aggregate clinical and genomic data across cancer centers,12–14 but many challenges remain. First, most such efforts lack granular, systematic collection of clinical data describing patient treatments and outcomes. While many projects are collecting patient age at diagnosis, cancer stage at diagnosis, and tumor histology, these variables represent only an initial approximation of what is needed for clinico-genomic research in oncology. Instead of data collected at a single point in time (for example, the time of diagnosis), longitudinal clinical data are needed to track the efficacy and toxicity of each real-world therapeutic regimen that a patient sequentially receives and to ultimately identify the tumor genomic contributors to these outcomes. Today, most observational studies that leverage EHR data rely on a one-time chart review. Infrastructure is needed to track, over time, when a patient’s electronic chart has updates (for example, a new line of therapy or updated imaging) that may be added to a continuously growing clinical journey. A valuable clinico-genomic database must be live, constantly updated as patients’ clinical journeys evolve.
A second key challenge is related to data standards for real-world data. Few standards exist for clinical data, and most have not been widely adopted.15–17 EHR data are recorded for the primary purpose of clinical care, and efforts to standardize data collection at the point of care might not necessarily produce research-grade data. Most EHR data remain unstructured (for example, there are free-text physician notes and pathology reports), and ultimately downstream methods and systems are needed to curate these clinical data for research purposes. Additionally, the completeness of EHR-derived data sets remains limited, and the reliability of analytic methods used to learn from sparse data sets (such as imputation) is not yet well understood. Although more established standards for genomic data exist, different commercial, hospital, and academic laboratories continue to use a wide array of (sometimes divergent) diagnostic assay and variant interpretation methodologies.18,19
Third, preserving patient privacy (while critically important) poses an impediment to linking clinical and genomic data. Today, a provider and a diagnostic lab cannot easily exchange identified patient data, even if patients wanted them to do so. To create a national cancer data ecosystem20 in which every patient who receives routine care can contribute both clinical and genomic data into an aggregated pool, protocols must ensure that identified patient data never leave health care entities, but that patient-level data are nonetheless linkable across both providers and labs. Deidentification processes, security policies, and third-party intermediary solutions could be developed to achieve this end, while preserving a firm commitment to patient privacy and compliance with the Health Insurance Portability and Accountability Act (HIPAA) of 1996.
Finally, human capacity and resources are a significant constraint. Initiatives requiring individual cancer centers to participate in a data-sharing network are often acutely limited in sample size by the resources available for clinical data curation at each center. Moreover, even when data are available, building effective research teams may require innovation in training. Researchers are typically trained in genomic data analysis or clinical outcomes analysis, but not in both. Individual scientists may need cross-training, or highly interdisciplinary teams may be required to make sense of rich clinico-genomic data.
Building A Clinico-Genomic Database: Overcoming Challenges
We attempted to overcome several of these traditional barriers to data flow and tested a novel approach for creating a real-world clinico-genomic database (CGDB) through which cancer patients could be tracked longitudinally.21
First, we identified sources of clinical and genomic data available at large scale. For clinical data, we curated EHR data from a geographically distributed group of more than two hundred US cancer centers in the Flatiron Health (FH) network. These data were sourced from oncology clinics and academic medical centers on whose behalf FH (an oncology technology provider) performs operational activities as a business associate, including the provision of an EHR used by many practices in the network. Because these activities fall under the Treatment, Payment, and Health Care Operations exemption of HIPAA, health care providers do not require patients’ informed authorization to disclose their personal health information to FH, which in turn received permission to aggregate and deidentify this information for the purposes of research approved by an Institutional Review Board.
At FH, data were aggregated across multiple source EHRs. Structured data (such as demographic characteristics, diagnosis codes, medication administration, and laboratory testing) were extracted from records and harmonized to a standard data model (for example, all laboratory tests for a specific analyte were mapped to standard units).22 Many additional data elements were then abstracted from unstructured documents using a system in which human staff trained in chart abstraction are assisted by technology in collecting targeted information (for example, the date on which a clinician documented disease progression) from free text in patients’ records. This process was centralized: The same policies and procedures were applied to all patient charts across the FH network, and all abstractors were trained to use the same policies. To achieve high data quality, inter- and intra-abstractor agreement (that is, agreement in data collection from the same chart between two different chart abstractors and agreement in data collected by the same abstractor at two different points in time) were continuously monitored and maintained above a high threshold across all data collection modules. Provenance was recorded for all data points collected, documenting who abstracted the data, at what time, and based on which documents in the EHR. Finally, to fill gaps, EHR data were supplemented with external data sets—most notably, mortality data from national and commercial sources.23
For genomic data, we accessed next-generation sequencing data produced by Foundation Medicine (FMI), a laboratory that has sequenced more than 160,000 tumor samples from across the US. These tests were ordered as part of routine clinical care, just as JD’s oncologist had ordered testing to guide her care. When ordering FMI sequencing, providers attest to verbal consent from patients to allow the lab to deidentify test results and use the aggregate data set for future research purposes. We estimated that about one-fifth of all patients who have undergone tumor sequencing by FMI would have been seen at a cancer center within the FH network and would thus hypothetically have both clinical (EHR) data available from FH and genomic data available from FMI. These data sets had never been linked before.
Having identified high-quality sources of clinical and genomic data, we next turned to the challenge of how to link these data sets at the patient level (for example, how to link JD’s EHR data to her tumor sequencing data) while ensuring absolute protection of patient identity. To do this, we developed a HIPAA-compliant process in which FH and FMI each generated tokens for every patient in the respective data sets. These tokens were deterministically generated in the same way at both FH and FMI from demographic data inputs (date of birth, first name, and last name). However, the tokens themselves were strictly deidentified: In other words, even with JD’s token, her demographic data or identity could not be retrieved. By engaging a third-party honest broker to ingest deidentified, token-paired clinical and genomic data from FH and FMI, respectively, and then replace the tokens with new patient keys, we were able to generate a linked CGDB of more than 25,000 distinct patients (see schematic in appendix).24 Before the linked data set was returned to us, it was also certified as statistically deidentified by an external privacy expert. Neither protected health information nor identifiable patient data were ever shared outside FH or FMI, and neither organization would be able to identify JD within the linked data set. However, we could analyze both her EHR and her tumor genomic data together for the first time (exhibit 1). We could also query the CGDB to search for other patients similar to JD.
Characteristics Of The Database
We relinked and refreshed the clinico-genomic database (including returning to every living patient’s chart to curate updated information) every three months for the period March 2016–March 2018. As of September 2017, approximately 20 percent of patients in the CGDB had been treated at academic centers, while the remainder had been treated at community-based oncology practices. The age and sex distributions of patients in the database represent those of real-world US cancer patients undergoing tumor sequencing. Fewer than half of the patients in the database had documented dates of death; the majority were continuing to receive care and can be followed longitudinally as they achieve remission, progress to new therapies, or advance to end-of-life care. Follow-up times ranged from less than six months, for recently diagnosed patients, to nearly three years. There were over 3,500 patients with lung cancer, 2,000 with colon cancer, and 2,000 with breast cancer (exhibit 2). In addition to individual genomic alterations across more than 315 genes, the available raw sequencing data allowed us to calculate tumor mutation burden (the total number of mutations per megabase across a tumor genome) and determine microsatellite instability status (MSI, which provides evidence of impaired DNA mismatch repair and predisposition to mutations) for each patient (exhibit 3).
Exhibit 2 Numbers of patients in the clinico-genomic database (CGDB), by tumor type, September 2017

Exhibit 3 Percentages of patients with tumors with high microsatellite instability in the clinico-genomic database (CGDB), by tumor type, September 2017

The database recapitulates known survival trends for subpopulations with defined biomarkers (for example, non-small-cell lung cancer patients whose tumors harbor mutations in the genes EGFR or ALK) who receive targeted therapies (for example, tyrosine kinase inhibitors such as erlotinib, which target EGFR). Consistent with recent literature, exploratory analyses of data in the CGDB suggest that a signature of high tumor mutation burden predicts the duration of response to checkpoint inhibitor immunotherapies.21 With each successive linkage of clinical (EHR) and genomic sequencing data, the database has grown in number of cases, volume of longitudinal information available for each patient, and variety of data collected. Over time, as new therapies are used in real-world populations, their effectiveness can be evaluated and stratified by tumor genotype. Research to identify novel diagnostic predictors of therapeutic response is under way. These data could also be used to study the interplay between targeted therapies and other patient outcomes, such as adverse effects, financial toxicity, and quality of life.
Innovation In Clinical Evidence Generation Using Clinico-Genomic Data
A recent FDA approval highlights the value of creating broad access to real-world clinico-genomic data in oncology. In May 2017 the FDA granted accelerated approval to a particular cancer immunotherapy (pembrolizumab; trade name Keytruda) for use in adult and pediatric patients based on the presence of a genomic marker (high microsatellite instability, or MSI-H) rather than an anatomically defined tumor type.25 This represented the first ever “tissue/site agnostic” approval in oncology; it emerged from a growing body of scientific and clinical evidence suggesting that tumors sharing key molecular aberrations may also share profiles of clinical response to certain therapies, regardless of the tumors’ anatomic origin.
The scarcity of clinical data available to inform tissue-agnostic research in oncology points to a gap that could be filled by using real-world data. The approval for pembrolizumab in patients with MSI-H was based on data from 149 patients with fifteen different tumor types across five single-arm trials. These data provided initial evidence that patients with MSI-H tumors have superior response rates to pembrolizumab across many tumor types, but there are hundreds of as-yet-undescribed patients in the real world who are also receiving checkpoint inhibitors (such as pembrolizumab) every day. In many of these patients, tumor molecular profiling is performed and MSI status is available. But, just as in the case of JD, these patients’ responses to therapy are being tracked only by their oncologists and are not, for the most part, being systematically recorded. Real-world data could augment this evidence base over time.
We queried the CGDB and determined the real-world prevalence of MSI-H across numerous tumor types (exhibit 3). Tumor type for each sample was confirmed by a pathologist. As expected, MSI-H was most common in endometrial cancer. Interestingly, we observed that tumors of unknown origin (unknown primary cancers) are more commonly MSI-H than several other tumor types, which suggests that perhaps the management of this disease entity should include MSI-H testing. The rarity of MSI-H overall—it occurred in about 5 percent of late-stage colorectal cancers; about 2 percent of unknown primary cancers; and in less than 1 percent of many other tumor types, such as lung cancers—underscores the need for very large sample sizes to ultimately assess the efficacy of immunotherapy agents in the MSI-H subpopulation.
We observed 251 total patients with MSI-H status, of whom 38 (15.1 percent) were treated with immune checkpoint inhibitors (data not shown). As the sample size in the CGDB grows over time, more in-depth studies regarding the use and efficacy of immune checkpoint inhibitors in patients with MSI-H advanced malignancies can be conducted. In fact, we observed empirically that the proportion of MSI-H patients in the CGDB who received checkpoint inhibitor therapy increased from about 10.1 percent in May 2017 (before FDA approval of pembrolizumab in that month) to 15.1 percent in September 2017.
Prospective clinical trials to compare immunotherapy response in MSI-H and other patients are still a key component of research and will be going forward, but this example highlights the potential power of real-world data in rapid evidence generation. The CGDB described here contains only a fraction of all real-world clinico-genomic data. If these data were aggregated even more broadly, they could provide an ongoing source of additional evidence in support of tissue-agnostic drug development.
Health Policy Requirements For Clinico-Genomic Real-World Evidence
High-quality clinico-genomic data sets are a requirement in service of the broader goals of generating and using real-world evidence for precision medicine. A continuously growing clinico-genomic data set provides a framework for ongoing inquiry in which biomarker-defined populations can be studied for novel target identification and comparative effectiveness. These data can be used not only to support new drug approvals based on the presence of genomic biomarkers (for example, pembrolizumab for MSI-H patients) but also to rationally narrow already existing approvals to only those patients who are most likely to benefit (as the FDA did for erlotinib in 2016, for example, after initially granting a broad approval independent of EGFR status in 2004).26 Narrowing approvals based on evidence is especially important in controlling the total cost of cancer care and in preventing unnecessary toxicities.
Health policy will play a key role in enabling the creation of clinico-genomic data sets at a national scale.
Health policy will play a key role in enabling the creation of clinico-genomic data sets at a national scale. Defining the uses of real-world data and modernizing evidence generation was a central theme of the 21st Century Cures Act, but more is needed. Health care data and technology policies that reinforce data interoperability, while ensuring privacy with clear guidelines on data deidentification, are critical. Although our case study demonstrates the feasibility of data linkages within the current boundaries of HIPAA, it represents just one of many solutions. Standardization of health data exchange practices and policies that incentivize data collection, sharing, and integration at the point of care could greatly enhance similar efforts in other settings. For example, the Centers for Medicare and Medicaid Services’ Oncology Care Model,27 a program designed to tie payments to care quality, demonstrated how a change in reimbursement policy could spur innovation as EHR vendors and oncology practices prepared to gather the data points required to assess quality. Such a policy change facilitated the adoption of value-based care, but it also indirectly incentivized the capture of additional information that can be aggregated into real-world data sets going forward.
Data and molecular testing standards are critical to making sense of real-world clinical (EHR) and genomic data.
Data and molecular testing standards are critical to making sense of real-world clinical (EHR) and genomic data. In oncology, we can rely on some existing standards such as those in the national tumor registry system. However, new variables that reflect the evolving forefront of care (for example, complex biomarkers such as MSI-H) are rapidly entering clinical decision making, and it is essential that these variables be integrated into data sets in a harmonized, quality-controlled manner. Precision medicine often requires the study of rare cohorts, and the ability to rapidly query a data set and identify patients who meet certain criteria requires that the criteria be prespecified, accurately captured, and represented similarly across all source data sets.
A scalable model for precision medicine requires continuous data sharing across multiple providers and labs, with tightly linked clinical and molecular data that match individual patient and disease characteristics with available and emerging therapeutic options. Oncology is at the forefront of the movement toward precision medicine, providing a fertile ground for collaborative research and a template for what is possible. High-quality, real-world, longitudinal, linked clinico-genomic data provide a path forward for continued acceleration of drug development in oncology and represent a modern addition to the traditional arsenal of clinical evidence.
ACKNOWLEDGMENTS
The authors acknowledge Management Science Associates, Inc., which served as the third-party honest broker to facilitate data linkage. The authors thank Daniel Barth-Jones, an assistant professor of clinical epidemiology at Columbia University, for his supervision of the statistical deidentification process required to ensure that the clinico-genomic data remained strictly deidentified both before and after linkage.
NOTES
- 1 Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016;375(23):2293–7. Crossref, Medline, Google Scholar
- 2 . Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst. 2017;109(11). Crossref, Medline, Google Scholar
- 3 . New “21st Century Cures” legislation: speed and ease vs science. JAMA. 2017;317(6):581–2. Crossref, Medline, Google Scholar
- 4 . Use of health care databases to support supplemental indications of approved medications. JAMA Intern Med. 2018;178(1):55–63. Crossref, Medline, Google Scholar
- 5 . Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. 2014;(4):MR000034. Medline, Google Scholar
- 6 . Harnessing the power of real-world evidence (RWE): a checklist to ensure regulatory-grade data quality. Clin Pharmacol Ther. 2018;103(2):202–5. Crossref, Medline, Google Scholar
- 7 GenomeVIP: a cloud platform for genomic variant discovery and interpretation. Genome Res. 2017;27(8):1450–9. Crossref, Medline, Google Scholar
- 8 . DNA-Mutation Inventory to Refine and Enhance Cancer Treatment (DIRECT): a catalog of clinically relevant cancer mutations to enable genome-directed anticancer therapy. Clin Cancer Res. 2013;19(7):1894–901. Crossref, Medline, Google Scholar
- 9 CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet. 2017;49(2):170–4. Crossref, Medline, Google Scholar
- 10 A randomized phase II study of the MEK1/MEK2 inhibitor trametinib (GSK1120212) compared with docetaxel in KRAS-mutant advanced non-small-cell lung cancer (NSCLC). Ann Oncol. 2015;26(5):894–901. Crossref, Medline, Google Scholar
- 11 . Prognostic and predictive value in KRAS in non-small-cell lung cancer: a review. JAMA Oncol. 2016;2(6):805–12. Crossref, Medline, Google Scholar
- 12 The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–20. Crossref, Medline, Google Scholar
- 13 The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–4. Crossref, Medline, Google Scholar
- 14 Toward a shared vision for cancer genomic data. N Engl J Med. 2016;375(12):1109–12. Crossref, Medline, Google Scholar
- 15 . Representing knowledge consistently across health systems. Yearb Med Inform. 2017;26(1):139–47. Crossref, Medline, Google Scholar
- 16 . Data interchange using i2b2. J Am Med Inform Assoc. 2016;23(5):909–15. Crossref, Medline, Google Scholar
- 17 . Evaluating common data models for use with a longitudinal community registry. J Biomed Inform. 2016;64:333–41. Crossref, Medline, Google Scholar
- 18 Panel’s “moonshot” goals released. Cancer Discov. 2016;6(11):1202–3. Crossref, Medline, Google Scholar
- 19 Facilitating a culture of responsible and effective sharing of cancer genome data. Nat Med. 2016;22(5):464–71. Crossref, Medline, Google Scholar
- 20 All the world’s a stage: facilitating discovery science and improved cancer care through the Global Alliance for Genomics and Health. Cancer Discov. 2015;5(11):1133–6. Crossref, Medline, Google Scholar
- 21 Development and validation of a real-world clinicogenomic database. J Clin Oncol. 2017;35(15 suppl):2514. Crossref, Google Scholar
- 22 . Opportunities and challenges in leveraging electronic health record data in oncology. Future Oncol. 2016;12(10):1261–74. Crossref, Medline, Google Scholar
- 23 Development and validation of a high-quality composite real-world mortality endpoint. Health Serv Res. Forthcoming 2018. Google Scholar
- 24 To access the appendix, click on the Details tab of the article online.
- 25 Food and Drug Administration. FDA grants accelerated approval to pembrolizumab for first tissue/site agnostic indication [Internet]. Silver Spring (MD): FDA; [last updated 2017 May 30; cited
2018 Mar 29 ]. Available from: https://www.fda.gov/Drugs/InformationOnDrugs/ApprovedDrugs/ucm560040.htm Google Scholar - 26 Food and Drug Administration. Erlotinib (Tarceva) [Internet]. Silver Spring (MD): FDA. [Page last updated 2016 Oct 10; cited
2018 Apr 10 ]. Available from: https://www.fda.gov/Drugs/InformationOnDrugs/ApprovedDrugs/ucm525739.htm Google Scholar - 27 Design challenges of an episode-based payment model in oncology: the Centers for Medicare & Medicaid Services Oncology Care Model. J Oncol Pract. 2017;13(7):e632–45. Crossref, Medline, Google Scholar