rationalskepticism.org

Posted: **Nov 22, 2018 2:30 am**

So earlier this week, Nature published a paper that I co-first-authored, looking at liquid biopsies for cancer detection and classification. An open-access read-only PDF link to the paper is available here.

A brief explanation of the context of the work and the findings themselves follows ...

Basically, for a long time, we have known that cancers differ from normal cells in different ways - they grow when they shouldn’t, and can do so indefinitely under the right conditions, they invade other tissues, and cause mayhem wherever they go. If cancers are caught at an early stage, before they have invaded other organs, surgery alone can get them, but the question is how we can do this.

What makes a cancer?

The reason cancers are so different from normal cells is that they often contain mutations (changes in DNA sequence) in genes that affect how cells grow and divide, and also a multitude of changes to DNA that do not involve the sequence, but rather how DNA is chemically modified or wrapped up (epigenetic changes). These epigenetic changes are often much more numerous in cancer compared to mutations - think tens of thousands of changes as opposed to hundreds at most. Moreover, lots of epigenetic changes are shared across patients with a certain type of cancer, say breast cancer or colon cancer, in a way that most mutations in a person’s tumour are not. When cells die, they release the DNA they contained into the blood, in the liquid portion of blood, called plasma. In technical parlance, these bits of DNA are called cell-free DNA (i.e, they are outside cells).

Of needles and haystacks...

Till now, efforts to catch cancer early using blood tests have tended to rely on getting DNA out of plasma and looking for the occurrence of certain mutations. This is challenging for multiple reasons ; one needs to design panels to find mutations in limited amounts of plasma and this means we have only been able to look at a few genes which are commonly mutated across different people’s cancers.

The other problem is that the genes that meet this criteria also tend to occur in cancers from different tissue types. Finally, people have started to find mutations that show up in cancer in normal tissues too (you need the right combinations of mutations to make a cancer, in isolation they are not sufficient). Therefore, the whole scenario is akin to looking for a needle in a haystack with other bits being the mirage of a needle.

In the paper, the first thing we did was use simulations to calculate what would happen if we only needed to find one needle and there were a lot more needles that could be found in the proverbial haystack. Unsurprisingly, we found that the more needles we have, the easier the task becomes, and we can find needles in bigger haystacks with more ease.

Here, each box shows percent of cell-free DNA that comes from cancer, and the number of "needles" in that haystack that indicate cancer. As the number of "needles" increases, the curves all move upwards , showing greater odds of spotting at least one with fewer searches. Given that changes in DNA methylation , a kind of epigenetic change, are widespread , well-known, and very specific to different types of tissues (and cancer), we sought to use these as a way to diagnose and identify cancer types.

Learning to exploit DNA methylation to detect cancer

Looking at methylation is often done through a method called bisulfite sequencing, but this is inefficient (85-90% of the sample is lost while it is being processed) and the approach uses DNA sequencing (something that maps bits of DNA to where in our genomes (chromosomes) they come from) to basically examine every fragment it finds for methylation changes. This includes regions that may not have any DNA methylation whatsoever. However, there is a cheap method called MeDIP which only recovers bits of methylated DNA from samples. The novelty of our work was to make plasma samples compatible with MeDIP by padding plasma with DNA from viruses that infect bacteria so that it could go through processing with minimal loss. Since this DNA is very different from human DNA, when we sequenced it, we could simply discount it and map the bits that come from human DNA.

We then tested it on plasma samples from pancreatic cancer, and for these cancers, we had methylation measurements both from the original tumours (these were taken out at Stage I , and plasma was obtained before surgery) and normal tissue. Further, we also had plasma from healthy people and methylation measurements from blood cells (which contribute the bulk of cell-free DNA).

We found thousands of changes in the plasma of pancreatic cancer patients with early stage cancers compared to healthy controls, as seen in the red dots in panel b, and by the different patterns of the orange and blue cells in panel c (orange = more methylation, blue = less methylation). Moreover, we compared the differences we saw using the plasma vs the differences we saw in the original tumour compared to normal tissue and to blood cells and found more agreement than expected by chance (shown by red and green dots in panels d and e).

This had us very excited , so we went on to examine if this was also true for seven other cancer types. Unsurprisingly , similar patterns were observed for most of the other cancer types, suggesting that our method could recover this information across a wide range of cancers, as you can see from there being red dots galore in the figure below, especially for A and C (a non-random association of methylation changes in the direction we’d expect when looking at methylation profiles from tumours vs those from the plasma).

Just how reliable is it anyway?

However, the challenge is figuring out how good these are at predicting cancer. So we did a lot of machine learning analyses and we undertook a procedure that involved using a portion of the 189 sample dataset we had established to build a model on 80% of it and testing it on the 20% of samples that weren’t used for building the model, hundreds of times over, using about 500,000 measurements of methylation per every sample. As the plots with the boxes and dots in panel b below show, we had reasonable performance for the vast majority of sample types (it goes from 0-1 bottom to top, 1 = perfect performance, and 0.5 is what you expect by chance). We also got measurements from another 199 samples, and using the collection of models trained on the original dataset, we were able to see very good performance (over 90% accuracy) in each case (panel c, where a perfect model would give you an inverted L).

Notably, some of these cancers, such as lung and pancreatic cancer, are notoriously hard to catch early, and it is nigh on unprecedented for us to be seeing performance this good. While it is not perfect, and we probably need a lot more work on how to make the measurements more reliable and the models better, those improvements will likely be built with this method at the heart, and the fact we have shown that DNA methylation is incredibly useful for this purpose is likely to spur a lot of interest in the field, and who knows what ingenious ways people might come up with to harness this.

rationalskepticism.org

Looking for cancer in the plasma using DNA methylation.