Skip to main content

Cancer Variant Curation Expected to Benefit from New Somatic Variant Data Standard

Premium

NEW YORK (GenomeWeb) – Sequencing-based tumor profiling is conducted by a growing number of academic cancer centers and diagnostic companies, but they each have their own method for describing and classifying somatic cancer variants, making it difficult to share or curate such variants across institutions.

To change that, the Cancer Somatic Working Group of the Clinical Genome Resource (ClinGen), a National Institutes of Health-funded initiative, has developed a set of standardized data elements, called Minimal Variant Level Data (MVLD), that it hopes will facilitate the exchange of somatic variant data. Last month, the group published a description of the MVLD framework in Genome Medicine, which it hopes will be widely adopted and "create a common language for curation and clinical interpretation of somatic alterations."

The new data standard will go hand in hand with forthcoming guidelines on the interpretation and reporting of somatic cancer variants by the Association for Molecular Pathology. The new AMP guidelines, developed in collaboration with the American College of Medical Genetics and Genomics, the College of American Pathologists, and the American Society of Clinical Oncologists, were presented at AMP's annual meeting last month and are expected to be published in a journal later this month.

Subhan Madhavan, director of the Innovation Center for Biomedical Informatics at Georgetown University Medical Center and co-chair of ClinGen's working group, told GenomeWeb that the group was established in early 2015 with the goal to create standards for describing and classifying somatic cancer variants and, potentially, to create a centralized resource for these variants.

The group has about 60 members, primarily from the US, among them academic medical centers; companies, including Foundation Medicine, Caris Life Sciences, Invitae, Illumina, and MolecularMatch; representatives of professional organization, such as AMP, CAP, and ACMG; and federal government agencies, including the NIH and the US Food and Drug Administration. Having commercial partners was important, Madhavan said, to ensure that the standards developed by the group would be adopted by industry. 

Initially, the group analyzed how its member organizations handle somatic variant data from patients at the moment and found that they all have their own computational pipelines and frameworks for interpretation. Typically, variants are classified into four tiers, depending on the level of evidence for actionability, with the highest level for variants associated with an FDA-approved therapy and the lowest level for variants that only have preclinical evidence.

While the groups all had similar types of frameworks, they differed in details, making it hard to compare and exchange data. "It's clear that we all would benefit if we could share information across different organizations because each organization does not have enough patients with a particular mutation and a particular disease type," Madhavan said.

Using a standardized data model, she said, would allow clinical labs to share their information and submit it to other databases, for example ClinVar, which is maintained by the NIH; Clinical Interpretations of Variations in Cancer (CIViC); or My Cancer Genome.

To develop such standards, the working group took a page from the MIAME (minimum information about a microarray experiment) standards that were created by microarray researchers 15 years ago. "Our goal was to come up with a minimum set of standards that groups will be willing to share on a particular variant that would be useful for them to cross-compare," Madhavan said.

For the MVLD, they came up with a set of 18 data elements in three categories: six allele descriptive fields, which address the position of the variant in the genome, transcript, or protein; six allele interpretive fields, which address information like the variant type and its effect on a transcript or protein; and six somatic interpretive fields, relating to cancer type, biomarker class, therapeutic context, effect of the variant on therapy, and level of evidence.

The process of consensus building to finalize the MVLD set took a long time and was completed this summer, "but we are very happy we did it that way because then the adoption becomes much easier," Madhavan said.

The next step is to help clinical labs and databases adopt the MVLD framework. "Our hope is that with this consensus standard, we can take the data in these siloed databases, map them to these standards, and make them available in a de-identified form in larger data systems, so that all of us could benefit, in a pre-competitive fashion, to do novel hypothesis generation and also to drive clinical decision support," she said.

Her own organization, Georgetown University Medical Center, like many others, has a tumor board, for example, that looks at the available evidence for associating a patient's tumor variants with therapies, and "having such a standardized database would help each one of these tumor boards to improve our clinical decision support process," she said.

Companies dealing with genomics data might also benefit from the standards. "At MolecularMatch, ClinGen standards help our technology process thousands of queries over very short periods of time," said Caleb Davis, a bioinformatics scientist at MolecularMatch, a startup focusing on matching cancer patients with treatment options based on molecular and genetic profiles, and a member of the ClinGen working group. "We anticipate the number will scale as the clinical utility of genomic profiling continues to improve."

The ClinGen group has already started to work with software teams at ClinVar and CIVic to integrate MVLD into their respective curation interfaces.

For the past six months, for example, it has collaborated with ClinVar to help modify its software to fit the MVLD standards. Currently, 98 percent of all variants deposited in ClinVar are germline variants but the group hopes to change that by working with NIH-funded tumor sequencing projects, for example under the Clinical Sequencing Exploratory Research (CSER) program, the Cancer Genome Atlas, or the NCI-MATCH trial.

Specifically, it plans to map two datasets — one from a CSER project, the other a childhood cancer project from Baylor College of Medicine — to MVLD and load them into ClinVar. This will achieve two goals, Madhavan said: testing the standard and increasing the number of somatic variants in ClinVar.

Another plan is to establish expert panels for curating somatic variants in ClinVar — something ClinVar already has instituted for germline variants. This could be one or several expert panels, and different panels could specialize in different genes, she said.

Furthermore, the ClinGen group is working with CAP to adopt the MLVD standards as part of its guidelines. In addition, Madhavan and her colleagues are spreading the word about MVLD abroad, presenting it, for example, at the CIViC Hackathon and Curation Jamboree at the Netherlands Cancer Institute in Amsterdam this week. The plan is to involve other international groups in the future, she said, including other European cancer institutes and the European Bioinformatics Institute.

Over time, the standards will be further developed and updated as the ClinGen group and other users gain experience with them. "This is just the first version of MVLD," Madhavan said.