Skip to main content
Premium Trial:

Request an Annual Quote

Researchers Make Progress in Efforts to Harmonize PD-L1 Tests, But Challenges Remain


NEW YORK (360Dx) – With the advent of new cancer immunotherapies, and older such therapies being used for new indications, the use of PD-L1 testing to determine which patients may best respond to them has moved into the forefront of cancer care. 

But for clinicians facing a mottled molecular diagnostic landscape, deciding on the right test is hardly simple. The different tests may not be cross-compatible or interchangeable, and clinicians are forced to forage through complicated data and considerations for what they say should be a straightforward choice. 

A recent study, though, suggests progress is being made in standardizing PD-L1 testing, though challenges remain. Publishing their results in in the journal Histopathology researchers from a cohort of German laboratories confirmed that three of the four first-launched commercial PD-L1 assays can be significantly harmonized. Additionally, the newer laboratory-developed tests can also be used reproducibly if they are properly validated.

Although a number of antibodies have been developed alongside specific therapies, only one assay, Dako's 22c3 test, is currently required by the US Food and Drug Administration to be used, when Merck's anti-PD-L1 drug Keytruda (pembrolizumab) is prescribed for non-small cell lung cancer.

Other commercial assays include Agilent/Dako's 28-8 assay, which is registered as a complementary (not-required) diagnostic to Bristol-Myers Squibb's Opdivo (nivolumab), and two assays from Roche Diagnostics, the Ventana Sp142 assay, which is a complementary test to Genentech's Tecentriq (atezolizumab), and the firm's SP263 assay, which the FDA approved as a complement to AstraZeneca's Imfinizi (durvalumab) this spring.

A fifth test, Dako's PD-L1 IHC 73-10 assay, is being developed as a companion to the Pfizer/Merck immunotherapy Bavencio (avelumab).

Meanwhile, groups are now also validating and using their own laboratory-developed IHC tests, using Cell Signaling Technology's E1L3N, for example.

The issue at hand is that from a clinical, or a patient perspective, any one of these assays should be able to answer the question posed in regard to any one of these different drugs — whether a patient has high enough expression of PD-L1 to be a likely responder.

It's neither practical nor efficient for a pathology lab to have to do four different assays to answer a single question, clinicians have argued, and being asked to score these different assays in completely different manners is "unprecedented in pathology," Yale pathologist David Rimm has said.

Further complicating things, various assays employ not only different antibodies and IHC platforms, but different indications require different cutoff points in determining whether a tumor specimen is 'positive' or 'negative.'

Keytruda, for example, is FDA approved for the first-line treatment of patients with metastatic NSCLC whose tumors have more than 50 percent PD-L1 expression, as determined by the Dako test. In second-line or greater, samples must only exhibit more than 1 percent staining.

In light of this complexity, various efforts have emerged to try to harmonize or standardize PD-L1 testing. These include one by the National Comprehensive Cancer Network, cochaired by Ignacio Wistuba of the University of Texas MD Anderson Cancer Center and Rimm from the Yale Cancer Center.

Another is being spearheaded by the the International Association for the Study of Lung Cancer, which recently released its Atlas of PD-L1 Immunohistochemistry Testing. The atlas collates characteristics of commercially available PD-L1 assays, and illustrates how they can and can't be cross-compared.

The IASLC is leading a project called BluePrint, in which researchers are working with a consortium of pharmaceutical and diagnostic companies to quantify the discordances between the various commercial assays, and work toward creating resources that can help simplify the testing landscape for clinicians.

The group presented data on the first phase of the project at the annual meeting of the American Association for Cancer Research in 2016, published them this February, and are now preparing for a Phase II follow-up that will validate and extend the results, and to which they have added the Dako 73-10 assay.

Andreas Scheel, senior author of the German LDT study last month, said that his team's effort was a collaboration among 10 pathology institutions in Germany and several pharmaceutical companies that manufacture PD-1 / PD-L1 inhibitors.

The same team of researchers previously investigated the interobserver-concordance of the first four commercial PD-L1 assays released by Dako and Ventana. Though they studied only a small number of samples, they concluded that by using a six-step scoring system that integrates all the cut-offs of the four separate assays, they could harmonize scoring from one pathologist to another.

In their follow-up this week, the group set out to expand their analysis to newer laboratory-developed tests, using a larger array of patient samples.

Using a tissue microarray containing 21 pulmonary carcinoma specimens, the group compared staining results across 10 sites, using the 28-8, 22C3, SP263 and SP142 assays, as well as 11 LDTs

Scheel wrote in an email that an increasing number of validated LDT protocols have now been published for various staining platforms, so the field is faced not only with commercial assays but also with the challenge of making sure new lab-developed tests are hewing closely enough to the assays that were validated as companions or complements alongside approved therapies.

The clinical utility of each assay — its predictive ability — rests on data from each of the clinical trial programs for its associated drug, so ostensibly, a home-brewed PD-L1 test shouldn't be assumed to be able to predict patient response. Likewise, the assay developed with one drug can't be assumed to be able to predict response to another.

But results like those from the German lab group suggest that commercial IHC assays for PD-L1 can yield compatible and transferrable results if handled appropriately. LDTs can be accurate, regardless of IHC platform they are performed on, as long as they are calibrated and validated properly according to the appropriate commercial benchmark.

In agreement with previous studies, the 22C3, 28-8, and SP263 assays showed similar staining patterns in the German study, while SP142 remained distinct. Among the LDTs, six of 11 protocols showed staining patterns similar to assays 22C3 and 28-8.

Differences from one LDT assay to another can't be traced to a single parameter i.e. the choice of primary antibody, detection system, or staining platform, Scheel wrote. Rather, a combination of those factors appears to contribute to an LDT being successful or not.

"Any of the five tested primary antibodies can yield successful staining, but using an antibody from a clinical assay doesn’t guarantee success," the authors wrote. Both correct staining and scoring are important; however, based on analysis of the study results, LDT staining seems to be a bigger source of variance than scoring, Scheel added.

Yale's Rimm, who is a co-leader of the NCCN PD-L1 harmonization program, said in an interview this week that there are lots of questions regarding standardization that a comparison like the Scheel team's doesn't answer.

He also said that a study using such a small number of patient samples should be taken with a grain of salt.

"Based on the percentages of patients that are positive, you can calculate how many you need for statistical significance, and for us, that was 90," he said.

Rimm and colleagues published a study in JAMA Oncology this year comparing four assays — the commercial 22c3, 28-8, and SP142 tests, and an E1L3N LDT — in 90 non-small cell lung cancer patient samples, enough he said, for the analysis to be appropriately powered.

This group found, as other data has indicated, that three of the four assays were largely equivalent, but the other (SP142) identified only about 50 percent of patients who were positive for PD-L1 expression per the first three tests.

Dako's 22c3 test also showed slightly, yet statistically significant, lower staining than either 28-8 or E1L3N, the authors wrote.  However, this was only apparent when averaging the scores of 13 different pathologists.

The group also found that pathologists' ability to score the various assays was consistent when looking at tumor cells, but not when looking at immune cells, which some assays — for example, Dako's 22C3, which was approved as a required companion diagnostic for Merck's Keytruda in gastric cancer earlier this month — require.

At AACR's annual meeting this April, Rimm also discussed work by his team that strips some of the complexity of the multiassay landscape. Using a baking analogy, he said that the antibodies used in the various assays can be thought of as "eggs," and the assays themselves as "cakes."

If you look at the antibodies independently of the assay protocols "the eggs are pretty much the same. The cakes are not," Rimm said. In other words, when he and his team compared stained sections using different antibodies outside of their assays, including the divergent SP142, they were clearly concordant.

"Though they might have slightly different affinities, the way they respond in terms of binding to lung cancer in cell lines is virtually identical," he explained.

Rimm's group previously developed a quantitative immunofluorescence method that they are continuing to study as a potentially better tool for estimating PD-L1 positivity than standard IHC, especially in immune cells.

Toward true standardization of PD-L1 IHC, Rimm said that he and his colleagues are also developing a standardization array, using cell lines, which would allow labs to check their assay — whether commercial or LDT — based on known limits of detection, and quantification, and saturation.

"With that, regardless of what assay or antibody you are using, you should be OK to say you are equivalent, because if you detect these spots [but] not these, your assay is working. Now, you just have to estimate the percentage that is positive in the patient sample," he explained.

A separate issue for PD-L1 tests is that they are based on what looks like an imperfect biomarker.  Patients who express PD-L1 appear to do better on these drugs than those that don't. And evidence has shown that those with higher expression respond more than those with lower expression. But there are also responders who are PD-L1-negative.

Rimm and his colleagues have shared results combining PD-L1 IHC with measures of T-cell activation as a way of boosting their ability to predict patient responses to immunotherapy.  Though they've only completed a small pilot so far, the team is moving forward to validate the approach further in patients from the "Stand Up 2 Cancer" lung cohort.

Various academic groups and companies have also pinpointed microsatellite instability, mismatch repair deficiency, and tumor mutational burden, leading to the FDA's accelerated approval this summer of Keytruda for patients whose cancers are microsatellite instability-high (MSI-H), or mismatch repair-deficient (dMMR) — the first time the agency has approved a treatment indication based on the genomic features in cancer patients' tumors, instead of where the tumor occurs in their bodies.

Another ongoing effort to identify responders to immunotherapy that PD-L1 may miss is being spearheaded by Nanostring Technologies, which has been working with pharma partners like Merck to develop a multiplex gene-expression assay combining PD-L1 and other markers.

According to Rimm, the data on that assay presented at meetings doesn't seem to show it doing significantly better than IHC so far.

"The chance of that replacing IHC is lower than I would have predicted a year ago," he said this week.

However, he added, the chance of IHC being replaced or augmented by something like tumor mutational burden seems much higher in the wake of the FDA approval of Keytruda in MSI-H and dMMR patients, and on the back of increased marketing by Foundation Medicine, which said this month that it is advancing a blood-based version of its tumor mutational burden test as a companion diagnostic to Roche/Genentech’s Tecentriq in first-line treatment of non-small cell lung cancer.

"It doesn’t predict better than PD-L1, but it [picks out] a different group of patients … so I kind of agree with them that if you are PD-L1-negative but TMB-high, you should get the drug," Rimm said. "They haven’t proved that, but I am sure they are working very hard on it."