Among the public, there is a common misconception that a DNA test can provide diagnostic certainty; the results are either positive, as in a diagnosis is finalized, or negative, as in a diagnosis is ruled out. The reality of genetic testing is much more complex. In fact, the biggest challenge facing clinical genetics is arguably the problem of interpretation and classification of genomic data.
Imagine a young woman who has recently undergone genetic testing after her father’s sudden cardiac death in his forties. This young woman is obviously first hoping for her genetic risk for sudden cardiac death to be assessed at near zero, but second she is hoping for clarity about her future health — a definitive yes or no and/or a high-confidence, quantitative estimate that describes her individual risk along with a plan of action to manage that risk. Instead, she is likely to be presented with set of opaque and confusing results: a variant of unknown significance, abbreviated VUS. Various medical professionals, including genetics professionals, may be able to provide some additional context or clear up some confusion for her, but, after the visit, this young woman is still left in limbo. Did this DNA variant kill her father? Will this DNA variant kill here and, if so, when? Can she even do anything about this?
This predicament is all too common, and it stems from a fundamental tension at the heart of our genomic age. The (haploid) human genome contains roughly three billion base pairs of DNA, and every individual harbors millions of genetic variants. Most are harmless, some are well-understood, but an enormous fraction reside in a gray zone. These so-called variants of unknown significance (VUS) call attention to the chasm between our technological capability to detect genetic variation and our ability to understand the biological importance of it.
When a VUS inevitably finds its way into a patient’s lab results, what are reporting laboratories to do? How should this information be communicated to patients? Suffice to say, even the professionals come to different conclusions on various facets of VUS-related reporting questions. Whether to report or not is a longstanding debate in clinical genetics. I will spare readers the details here (though I can return to it another time if there is interest). Personally, I like to err on the side of transparency, but I recognize that more information is often only beneficial to those who possess the faculties for processing it. So instead of a digression on the ethical and epistemic quandaries in genetics, let’s re-direct to some of the emerging strategies for solving the problem altogether.
When a VUS is identified, it means that a coding difference, typically a missense variant (meaning a DNA substitution that causes one amino acid to be substituted for another) in a protein-coding gene, has been detected by a specific type of assay that reads the DNA sequence, but despite the best efforts of genetics professionals to evaluate the effect of that coding difference, there simply isn't enough evidence to make a confident call (see the visual for the evidence thresholds below). The usual reason for this is simply a lack of information. Any given VUS may be incredibly rare and therefore has never been observed before. Hard to say anything about a complete unknown. Alternatively, it may have been observed in a few individual during the DNA sequencing of a large population, but there isn’t a literature on the effects of that variant. Or, the most frustrating of all, there may be conflicting reports on the variant — some providing evidence that it is disease-causing (pathogenic) while others providing evidence it is not disease-causing (benign).

All of the above VUS scenarios can usually be resolved with more data. For instance, if more evidence emerges from classical family studies showing the variant of interest associates with a disease phenotype of interest in a patient and segregates with that disease in the patient’s family, then this evidence would likely push the classification from VUS to pathogenic. Evidence of this kind is unlikely to be available for a given variant, especially when the variant is itself rare. Subsequently, we have to rely on prediction tools or functional studies. I won’t describe all the in silico predictors out there; there are a lot! Although the predictors are improving, including with some help from developments in the field of artificial intelligence, they are still not reliable enough for use to solely rely on them to make diagnostic classifications. This leaves functional evidence as the last reserve of the clinical geneticists.
Functional evidence is unfortunately expensive and sometimes arduous to collect and publish. It isn’t full-proof as a final arbiter of deleteriousness either. However, methodological advancements have recently enabled researchers to collect functional data on a massive amount of gene variants simultaneously. This experimental approach is referred to as a multiplexed assay of variant effect (MAVE) or sometimes as deep mutational scanning (DMS). Simply, these assays try to generate every possible missense variant in a gene, introduce them into a functional assay, and then efficiently measure and visualize the impact of these variants.
One of the exciting things about MAVEs is that they can leverage some of the same technology that has powered our genomic era, next-generation sequencing (NGS), making generating functional data more efficient and inexpensive (the cost of NGS has fallen at a rapid pace). The data has already begun to accumulate since the first low-throughput MAVEs were carried out in the late 2000s. Currently, the database of record for MAVE data (MaveDB) includes ~2,000 datasets comprising ~7 million variant effect measurements. Additionally, these data already inform VUS re-classification efforts. Across twelve previously published studies, a total 937 of 1,711 (55%) VUS classifications have been reclassified due to MAVE-derived findings (summarized in the figure below).
The prospects for MAVE work informing variant classification are bright as not only is the functional data itself immediately informative, but the data can be used to refine the output of in silico predictors of pathogenicity as well as serve as input for the training of new and better predictors. Given the low-cost of generating predictions, the quickest path from MAVE work to ending the VUS problem may be mediate by improving prediction tools.
An immediate solution for the VUS problem is not on the horizon, but at least we can see some of the vague outlines of a solution. While we wait, there will be minor tweaks and short-term patches that will hopefully provide some remedy. This will likely include the risk tiering of VUS, where high-risk VUSes will be reported while low and intermediate risk VUSes will be withheld from reports. This re-opens the reporting debate I alluded to earlier, but it likely represents some improvement over throwing the overwhelming majority of observed variant during genetic testing in the gray zone. Let’s have the debate and take action fast!
I hope clinical genetics is able to reach the National Human Genome Research Institute’s optimistic goal of solving the VUS problem by 2030. Regardless, we know MAVE data will play an important role. We cannot rest on our laurels. Biology is a field that needs more data! Although MAVE progress has been substantial and is already affecting variant classification, only a fraction of the protein-coding genome has been assessed in a high-quality, comprehensive way and an even smaller fraction of the non-coding genome has been evaluated. This needs to change. There is a long road ahead.