This is the first story in a two-part series about the Center for Genomic Interpretation's efforts to help payors better understand the quality of next-generation sequencing tests. Read Part 2 here.
NEW YORK – Insurer Highmark is asking labs selling in-network next-generation sequencing services to submit data beyond what is required under the Clinical Laboratory Improvement Amendments or for accreditation by the College of American Pathologists, and demonstrate that their tests can accurately gauge patients' cancer risks and predict which oncology drugs they might respond to.
"Certifications from CAP and CLIA are necessary but are not sufficient guarantees of quality from high-complexity labs," Highmark said in an August letter to 18 of its biggest in-network labs performing NGS-based germline and somatic oncology panels. The Blue Cross Blue Shield-affiliated insurer said it is seeking information from an independent third-party group on the accuracy of test results, including interpretation of variants of uncertain significance (VUS). However, Matt Fickie, Highmark's senior medical director, suggested that a network of small labs could potentially work together to create something similar and meet this requirement.
Labs won't have to do this for every somatic or germline oncology NGS panel they offer commercially, but for a representative test of each kind. They have until March 1, 2023, to provide the data.
In the letter, Highmark recommended labs work with the Center for Genomic Interpretation (CGI) or a similar third-party group to perform the validation. If they decide to go with CGI, a nonprofit started in 2017 by variant classification scientist Julie Eggington, they will undergo an in silico-based quality assessment. Within its ELEVATEGenetics Brilliant program, CGI asks labs to analyze synthetic variants inserted into electronic DNA sequence files and produce test reports for mock patients. The organization then scores the labs' ability to detect the variants; describe the DNA and protein changes they're causing using standard nomenclature; and interpret what the variations mean in a clinical context.
While Highmark's in-network labs don't have to use CGI, Fickie isn't aware of another group using in silico methods quite like Eggington to validate not only labs' ability to detect variants using NGS but also their ability to interpret variants. For high-complexity labs, "CLIA and CAP are a joke, and they all know that," Fickie said. "You need to be accountable not just for your processes, which is what CLIA and CAP do, but also the [reported] outcome, that part that is subject to interpretation."
This view and the recommendation to use CGI are sure to be controversial among industry players, particularly when it comes to using in silico-based methods to evaluate the quality of labs' variant interpretations. Some labs will balk at having to compare their variant interpretations to CGI's, arguing that the field of genetic testing is still in its infancy and there is a lot of subjectivity in the science and art of classifying variants.
It's well documented that genetic testing labs can come to different conclusions about the clinical significance of the same variant. This commonly happens if the first lab didn't have access to the second lab's data, said Heidi Rehm, medical director of the clinical research sequencing platform at the Broad Institute, and while labs can usually resolve the discrepancy with data sharing, those initial interpretations aren't necessarily wrong if each lab applied the data they had in hand.
In the same way, CGI's interpretations may differ from the labs it is evaluating because it doesn't have access to their unpublished data or because it may be applying current variant classification guidelines differently. "It would be hard to say that CGI is right and others are wrong. Who's to say that everyone should compare themselves to CGI?" said Rehm, who has urged insurers to stipulate data sharing in public databases like ClinVar for test coverage.
Fickie views CGI's quality assessment program as the current "gold standard" in using in silico-based methods to understand the analytical and clinical validity of NGS tests, but he acknowledged that not everyone in the lab industry agrees with CGI's methods. Eggington, who led variant classification efforts at companies like Myriad Genetics and 23andMe, has become an increasingly polarizing figure since cofounding CGI by calling out quality issues at genetic testing labs and supporting greater test oversight. "She's an iconoclast," Fickie said. "She's out there spilling the beans."
"But that's why we leave this open," he added, emphasizing that Highmark wants to start a conversation with industry about ways of assessing NGS test accuracy. "We're open to the idea that there are other ways to do it," Fickie said, though he's betting that labs will eventually realize "that their best bet is to start with [CGI]."
In a recent meeting with Highmark, Invitae Chief Medical Officer Robert Nussbaum urged that instead of requiring additional third-party validation, the insurer should incentivize labs to submit variants and the evidence supporting their classifications to ClinVar, which, in turn, would foster peer review and discussion of variant interpretation discordances. This, Nussbaum said, "is the best approach to quality variant interpretation over time," but he disagreed that there is currently a "gold standard" method for determining the quality of a lab's variant interpretations. Nussbaum and Rehm were principal investigators on the National Institutes of Health grant that established ClinVar in 2013, and since then, Invitae has become the most prolific submitter to the database.
Whether labs are correctly interpreting the clinical significance of variants is, for the most part, currently not vetted by US health regulators. This aspect of the testing process, many in the lab industry assert, is akin to the practice of medicine and therefore, cannot be addressed under CLIA or regulated by the US Food and Drug Administration. In fact, the genetic testing industry has historically argued that a lab's activities, from developing a test in house to furnishing results to patients, constitutes the practice of medicine and is therefore outside of the FDA's statutory authority. The agency has evaluated the classifications of a representative set of variants detected by lab-developed companion diagnostics it has approved, but the agency currently doesn't regulate most lab-developed tests on the market.
Amid these regulatory gaps, Fickie asserted that insurers "have an opportunity to be an enforcer" of genetic test quality. This same realization is what prompted Eggington to take the in silico-based validation program to payors.
Before that, she had been shopping it to labs as a way to internally flag performance issues but there wasn't much interest, Eggington found, because there was no incentive for labs to go beyond what they had to for CLIA certification or CAP accreditation. But for the past year, CGI has been quietly running pilots with insurance companies to demonstrate how CGI's program can be used to identify NGS test quality issues not being addressed under the present regulatory system.
Payors have certainly noticed utilization, spending, and billing fraud increasing in the genetic testing space, and are becoming more vigilant about their costs in the sector. Eggington hopes that at a time when "labs are becoming increasingly aggressive in the claims they're making," the learnings from these pilots will spur insurers to seek additional test validation data, like Highmark is doing, and pressure labs to improve NGS test performance.
"May be a lab is CLIA certified, CAP accredited, may be the test has New York State approval, or even FDA approval, but the insurer can still ask labs to demonstrate that they can do this right," Eggington said. "The insurance companies can say, 'I, the payor, am going to accredit you as being a high-quality lab because I care about doing precision medicine right for my plan members.'"
CGI's pilots have certainly raised questions among payors about how accurately CLIA-certified and CAP-accredited labs are detecting, naming, and interpreting variants within their NGS testing services. "It is scary, a lot of what [insurers are] finding," said Eggington, though nondisclosure agreements hinder her from revealing the insurers or labs CGI has engaged, the specific genetic variants labs are being tested on, or the specific quality issues uncovered through the pilots. "I hope they understand the nuance of how we're presenting this data to them."
'Deeply problematic'
Within its ELEVATEGenetics Brilliant program, CGI asks labs to sequence a piece of publicly available DNA and submit an electronic file, called a FASTQ file, of the resulting sequence read. CGI sends these files and the technical specifications of the corresponding test to a company called P&V Licensing, where variants are inserted into the DNA sequence files at predetermined variant allele frequencies and developed into "dry samples" that represent mock patients with medical and demographic details.
The in silico method CGI is using was developed by John Pfeifer, Eric Duncavage, and Haley Abel, experts in pathology, sequencing, and bioinformatics at Washington University School of Medicine, St. Louis. Pfeifer, a professor of pathology and immunology at WashU, subsequently cofounded P&V Licensing LLC, a service through which labs can use mutagenized variant files to evaluate the analytical validity of their NGS assays. CGI has expanded the use of this method to also look at the quality of variant interpretations, which speaks to the clinical validity of tests.
CGI sends the mutagenized files created by P&V back to the labs, where using bioinformatics tools and genetics experts they identify variants compared to a reference sequence and determine what the variants mean clinically for the mock patients. The labs draft a test report of the findings just like they would for a real patient and submit it to CGI for scoring.
Since the mock patient samples are created from the very FASTQ DNA sequence files each lab has generated, Eggington underscored that the quality assessment accounts for each lab's unique NGS test chemistry and workflow. CGI evaluates each test according to the performance metrics the lab claims in the technical specifications and does not fault a lab for not reporting genetic alterations present at variant allele frequencies (VAF) below a tests' stated limit of detection.
So far, in its pilots with insurers, CGI has mostly focused on whether labs performing comprehensive genomic profiling tests can correctly flag on-label FDA-approved cancer treatment opportunities for mock patients based on detected variants. The interpretations of the variants used to test labs in this part of the challenge are undisputed in the field, Eggington said, adding, "That doesn't mean that labs are going to get it right, because it might be a really hard variant to detect."
When it comes to germline variants associated with cancer risk and variants linked to off-label treatments, Eggington recognizes there's a higher chance that labs might have a different interpretation than CGI's based on differences in variant classification methodology as well as proprietary data. Therefore, the organization doesn't score whether a lab has interpreted them "correctly," but rather "how adventurous the lab likes to be," Eggington said. Is the lab synthesizing the evidence on variants conservatively and trying to be more specific by minimizing false positives? Or does the lab tend to be more sensitive and try to limit false negatives?
In this second part of the challenge, Eggington compares labs to CGI's variant classifications, which are made based on a "stringent" application of the American College of Medical Genetics and Genomics/Association for Molecular Pathology's 2015 consensus guidelines for germline variants, and the AMP/American Society of Clinical Oncology/CAP's 2017 consensus guidelines for somatic cancer variants. CGI also compares labs' classifications against those made by expert panels and shared in the public database ClinVar.
In CGI's pilots, payors haven't focused much on this second part, because they've been "shocked" by the errors labs have made in recommending FDA-approved drugs. "Honestly, we didn't think we'd see a problem with labs interpreting variants associated with FDA-approved drugs," she said. "I thought labs would consistently be able to identify the variants that their test technical specifications say they can measure, at the variant allele frequencies they say they can detect, and match it to the correct drug. It turns out this is deeply problematic."
Fickie also understands that variant interpretations can differ between labs for various reasons. But if initially a lab's interpretations vary significantly from CGI's, Highmark doesn't need to see that. The lab can work with CGI, he said, to fix areas of concern or disagreement, take the assessment again, and submit the best score.
In the pilots with payors, there have been some "nitpicky disagreements" over CGI's interpretations for a few variants, Eggington acknowledged, but for the most part the reports have been eye-opening for labs. Some have launched investigations to figure out why the errors happened and what fixes are needed to avoid them in the future, she said. "Which is exactly what we wanted to happen."
Labs that don't want to work with CGI can propose an alternative. Some labs that have tests approved by the New York State Department of Health, which asks for analytical and clinical validity evidence beyond what is currently required under CLIA, have wondered if that data will appease Highmark.
Fickie would consider it if the lab is performing the test according to the NYSDOH-approved specifications for all patients in the country or if they're selling the test only in New York. But when labs' performance claims for the same test are different for patients outside of New York than their NYSDOH-approved specs, it's confusing for insurers to parse which is true. "It is bizarre, let's be honest. Why are there different tests for New York and the rest of the country?" Fickie said.
For example, Tempus claims in a document outlining validation data for its xT 648-gene NGS panel that it can reliably gauge insertions and deletions at 5 percent VAF, but in a document outlining the NYSDOH-approved specs, the indel limit of detection is at 10 percent VAF.
According to a Tempus spokesperson, the xT assays for New York and non-New York residents are performed under "identical conditions" in the company's CAP-accredited and CLIA-certified lab, and there is no difference in "actual performance." The specs in the documents differ, the spokesperson explained, because NYSDOH's evidence requirements for test approval differ than what is required for validation under CLIA and CAP, and added that "the only differences are in reportability of findings, which vary based on submission of claims to NYS."
"While we certainly believe (and have demonstrated to CAP/CLIA requirements) that the assay can detect indels reliably at 5 percent VAF, we are not currently permitted to report that or make that claim in NYS," the spokesperson said. "We are only allowed to report in NYS what was contained in our last NYS submission … in late 2021, and it has not been updated since that time. As a result, we report indels at the 10 percent VAF level in NYS to ensure we remain in compliance with their requirements." Tempus said it did not receive a letter from Highmark on additional credentialing requirements.
Caris Life Sciences claims in a New York-specific technical specification document for its comprehensive molecular profiling test that the positive predictive agreement for gauging base substitutions via whole-exome DNA analysis is 95 percent at 5 percent or higher mutant allele frequency; but the PPA is 99 percent in another document describing the test's performance for patients outside New York. For indels, according to NYSDOH validation, the PPA is 99 percent at 5 percent or higher mutant allele frequency, versus 97 percent in the other document.
Caris did not respond to questions about these different specs, but said it is aware of Highmark's credentialing requirements. "As a matter of policy, we do not comment on the details of our interactions of this nature," the firm said.
According to Fickie, Highmark will scrutinize labs proposing to submit their NYSDOH validation data for this credentialing requirement. "We're going to go on the website and look at what you do outside of New York," he said. "We're going to try to push back on that and challenge people."
Let payors decide
CGI began working with insurers to spur these kinds of conversations by demonstrating how important it is to consider a lab's ability to detect, name, and interpret variants when evaluating test performance. When it comes to detection, for example, CGI's pilots have shown insurers how a lab may fail to identify a variant when it is present in the sample close to the test's claimed limit of detection.
Variant nomenclature mistakes can also occur due to human error or when NGS variant calling pipelines pull information from databases and resources that use different naming conventions for describing genetic and protein sequence changes. Failing to correctly identify a detected variant can confound a lab's interpretation of it, because "what labs are going to look up in the academic literature and in [variant] databases, is that name," said Eggington.
It is also possible to err in determining whether a patient should receive an on-label FDA-approved treatment based on a detected variant. For a variant in a tumor to be considered druggable by an FDA-approved therapy, it has to disrupt a specific exon in a certain gene. "Some labs will say any variant, anywhere near a splice site, is going to screw up that exon and the patient should get the … on-label drug recommendation. That's not true," she explained. "Exon/intron boundaries can't tolerate certain kinds of alterations, but they can tolerate a lot of change. … So, not being careful about RNA splicing, for example, can give a bad classification."
One of the main learnings Eggington wants to impress on payors is that labs currently design NGS comprehensive cancer profiling tests to lean more sensitive to increase the chance of a therapeutically actionable result. Other studies have hinted at this, she said, citing a JCO Precision Oncology paper from 2019, in which researchers compared an unnamed NGS lab's tumor-only classification of variants in several cancer-associated genes, such as BRCA1/2, against nonconflicting germline classifications in the public database ClinVar.
Germline testing and tumor profiling have different goals in that the former can be used to gauge disease risk, and increasingly, to guide treatment, while the latter is used to personalize cancer therapy. Moreover, germline and somatic variants are classified using different sources of evidence and guidelines.
When the authors of this paper looked at 93 variants reported as pathogenic by the tumor profiling lab, 81 were also pathogenic in ClinVar, but two variants were benign and 10 were deemed VUS in the public database. One of the discordant classifications was for BRCA2 c.4061C>T (p.Thr1354Met), which the lab reported as pathogenic, noting this variant had previously been seen in tumors. Germline testing labs that had shared data on this variant in ClinVar agreed it was benign. Today, this variant is still benign in ClinVar with three stars, meaning it has been reviewed by an expert panel.
The authors of the JCO Precision Oncology paper noted that their aim in describing these variant classification discordances was not to say that the tumor profiling lab was wrong but to help clinicians understand that for certain genetic findings in patients' tumors, they may need to check the germline classifications in ClinVar and order testing to assess cancer risk.
In its pilots with insurers, where CGI is gauging if a lab's variant interpretation is more sensitive or specific, the organization might test comprehensive genomic profiling labs on a variant like this BRCA2 alteration. If a lab reports such a variant as being associated with response to a PARP inhibitor, then CGI would characterize the lab's interpretation as "sensitive."
"CGI doesn't really know if treating a patient with this variant with a PARP inhibitor helps them or not," Eggington said. "Without more evidence, CGI is not in a position to describe this classification as correct or incorrect."
Testing labs with variants like this in CGI's pilots provides opportunities to show payors how by applying current variant classification guidelines, labs can arrive at vastly different interpretations. In Eggington's experience, tumor profiling labs tend to have a "very permissive" reading of somatic variant classification guidelines and the FDA's biomarker evidence tiers, and are prone to classifying variants as druggable as long as they've been observed in tumors.
The tumor profiling lab in this case claimed to have seen BRCA2 c.4061C>T in a patient's cancer. But this variant, Eggington noted, has been seen 37 times in the Broad Institute's Genome Aggregation Database and is carried by 0.04 percent of non-Finnish Europeans, making it a rare variant that's common enough to be well studied and understood as not disrupting normal BRCA2 gene function.
When determining whether a cancer patient will respond to a PARP inhibitor based on a BRCA1/2 variant, however, it is not enough to have seen the variant in a tumor. It's also crucial to consider whether the alteration is causing loss of function and hindering the gene's ability to repair DNA damage. Germline genetic testing guidelines, in Eggington's view, place a lot of weight on whether a variant is causing loss of function when determining its pathogenicity, and "are much more relevant" when determining whether a patient should receive a PARP inhibitor based on a BRCA1/2 tumor variant. But presently, she said a lot of tumor profiling labs could ignore the germline classification and still report such a variant as druggable.
"That may be what's happened here," Eggington suspects, but added, "whether this is right or wrong, and what the payors will think about it, that's not for [CGI] to say."
Outside of an academic discussion, in the real world, these types of variant classification discrepancies are impacting patient care. Emily Moody, the first author of the JCO Precision Oncology paper and a cancer genetic counselor (who previously worked with Eggington at the now defunct molecular testing company Courtagen), recently reviewed discrepant results between tumor profiling and germline tests for a major healthcare center and estimated that one in 20 variants deemed pathogenic via a tumor profiling test would be considered uncertain or benign in the germline context.
In these instances, germline testing wouldn't be ordered for the patient, but an oncologist could still prescribe a PARP inhibitor. The FDA has approved PARP inhibitors to treat breast, ovarian, pancreatic, and prostate cancer, in some cases specifying that patients must harbor germline pathogenic BRCA1/2 variants and in other cases noting patients can have somatic or germline pathogenic variants. Because these drugs are mostly approved for people with advanced cancer, Moody would expect insurers to cover a PARP inhibitor as long as the patient had a BRCA1/2 variant deemed pathogenic by either a germline or tumor profiling test.
When Eggington questioned the tumor profiling lab, which she said is a well-known provider of comprehensive genomic profiling, about its pathogenic classification of the BRCA2 variant from the JCO Precision Oncology paper, the reaction was: "So what if we're overinterpreting variants? What if we're right? The drug will work," Eggington recalled. She acknowledged that cancer patients are more likely to have their tumors profiled on large NGS panels when they're at the end of life and have tried and failed other therapies, and oncologists want to find "anything they can treat their patients with. Anything at all."
On the other hand, if the doctor prescribes a drug based on a lab's very sensitive variant classification and it doesn't work, the patient may have to wait months for the therapy to wash out of their system before they can try something else. Plus, precision oncology drugs are costly and have associated toxicities. "There are a lot of implications to giving that patient the wrong drug based on a hunch," Eggington said. "When we return these results to payors, we talk them through what it means in each scenario to be sensitive versus specific, and then, we let them decide."
Payors need help
Highmark, which for the past year has been internally mulling ways of improving lab quality management, has been talking with CGI as part of that process and is the first insurer to recommend labs have their tests validated by the organization as part of a new credentialing requirement. But Fickie believes there's a growing realization among payors that they need to look at test performance beyond CAP and CLIA.
"[Other payors] are going to see what we're doing, and they're going to be setting similar policies, and these [labs] are all going to go to CGI in the next 12 months, not just for Highmark," predicted Fickie, who prior to Highmark was a primary care provider and genetics consultant at Mount Auburn Hospital in Cambridge, Massachusetts. Fickie was also associate medical director at EviCore, a Cigna subsidiary that manages genetic testing benefits for various health plans, including Highmark.
"We're going to be the opening shot," he said.
Several commercial payors contacted for this story, including Cigna, declined to confirm whether they were engaged in CGI's pilots. Cigna directed questions to Lon Castle, chief medical officer of molecular genetics and personalized medicine at EviCore, who said he's familiar with CGI's ELEVATEGenetics Brilliant program and wasn't surprised to hear insurers are testing it out.
"Precision medicine is where we all want to go for cancer and other hereditary diseases," but, Castle said, insurers have to be very diligent about the quality of tests and ensure labs are providing accurate results to patients. "You want to make sure that they're all doing it equally well and they're coming to the same sorts of conclusions regarding … those variants," he said. "The most important thing if we're going to succeed with precision medicine, is we have to get this right."
Insurers certainly have an economic interest in ensuring that they're paying for high-quality genetic tests. The US Department of Health and Human Services' Office of Inspector General estimated that Medicare spending on genetic tests doubled from 2017 to 2018 to $1 billion. While OIG attributed this to greater utilization of new and expensive tests, the government believes fraudulent lab billing schemes are also a factor.
Meanwhile, the genetic testing market has added on average 22,000 new tests per year since 2015, and according to health IT firm Concert Genetics, the total number of tests rose to nearly 167,000 in 2020. With this increased capacity, genetic testing labs are uncovering genetic variants faster than the field can make sense of them, while payors are struggling to keep up with all the validation data, research, and CPT codes generated by these new products.
"Payors need help," said Megan Czarniecki, senior VP of payor solutions at InformedDNA, which launched a service in May through which insurers can license its evidence-based coverage policies for genetic tests. "We're much better at testing than we are at interpreting" what genetic variants mean, Czarniecki said.
Amid increasing utilization and regulatory gaps, insurers may also have a legal interest in ensuring they're paying for accurate tests. Highmark's new validation requirements come at a time when patients and families receiving genetic tests are able to more closely scrutinize the test results they're getting and challenge labs' test performance claims in court.
Quest Diagnostics, for example, was sued in 2016 for allegedly misinterpreting an SCN1A variant in a child with an epileptic disorder, though a South Carolina federal district court judge dismissed that case in 2020, since the plaintiff could not prove that doctors relied on the test report to diagnose the child. Earlier this year, Natera was hit with two class action lawsuits over how the company advertised the accuracy of its noninvasive prenatal testing assay.
Jennifer Wagner, a lawyer with expertise in the regulations and policies governing the use of genomic technologies, is not aware of a case where an insurance company has been sued for reimbursing a genetic test that gave inaccurate results, though it's not out of the realm of possibility. "To the extent that insurers are dictating which labs' genetic tests will be covered, you could say that they have a pretty direct role in whether a particular test is used," she said.
Moreover, if quality assessment conducted by the insurer or another organization shows that a test was performing poorly, then does the insurer have a legal responsibility to stop the use of that test, even if it was ordered by a doctor? "There's a question about what role insurers have in remedying any kind of injury that might result from that," Wagner said. Within CGI's pilots, it's been very important to insurers that the in silico framework ask labs to detect and interpret synthetic variants for mock patients, according to Eggington, because payors don't want to be liable for labs giving inaccurate results to real patients.
Upon learning about Highmark's credentialing requirement and the request for data beyond what is required for CLIA certification and CAP accreditation, Neal Lindeman, a pathologist at Mass General Brigham and vice chair of the molecular oncology committee at CAP, said the insurer appears to be taking on a regulatory role. "That's dangerous," he cautioned because there is a conflict of interest. "What payors could do if they oversee regulation is they could cut deals with labs and approve assays in exchange for a favorable charge structure," he said. "I don't think that's good for healthcare."
In deciding whether to pay for a medical intervention, however, insurers can and do ask to see additional evidence, and even encroach on the practice of medicine. "Insurance companies second guess the practice of medicine all the time," reflected Gail Javitt, director at the law firm Hyman, Phelps & McNamara, and an expert on lab test regulation. "That's what they do."
In its letter, Highmark didn't say what the consequences will be for labs that don't submit additional validation data come next March, but loss of in-network status could be a "potential consequence," Fickie suggested.
Eggington is hoping that insurers will gain insight into the poor-performing labs through CGI and weed them out of the market. "This whole effort is to help [payors] identify the good tests and the good labs, so they can move forward confidently with writing policy to cover precision medicine and just stop reimbursing the bad labs and the bad tests," she said.