July 12, 2020

The Other 99% of Our DNA…Is That Where Cancer Is Hiding?

By: Franklin Wu

Cancer, at its core, is caused by genetic changes to cells. Whether by environmental, biological, or lifestyle factors, the regions that regulate the growth and development of cells are disrupted such that the cell begins to multiply uncontrollably. With time, the group of cells grow into a tumor, which can then spread throughout the body, stealing resources and causing physical harm to healthy tissue.

A simple schematic of how cancer develops.

Taking it a step further, it stands to reason that some mutations confer a selective advantage to cancer cells, allowing them to out-compete normal cells and outlast our immune system. Otherwise, it would be difficult to explain how cancers spread so aggressively. These mutations are what we call “cancer drivers”, or in other words, mutations that drive cancer.

Already, there has been extensive literature in this field. Scientists have identified numerous genes linked to cancer. However, the vast majority of these discoveries are made in the protein-coding regions of our DNA. There’s likely a lot more to the picture—after all, the coding region only comprises 1% of the entire human genome.

Along with Dr. Allen, I will be trying to identify cancer drivers in the non-coding regions of the human genome. The plan is to intersect two existing datasets: one from the international Pan-Cancer Analysis of Whole Genomes (PCAWG) study, and one from Dr. Charlie Gersbach, who is a Duke faculty in the Department of Biomedical Engineering.

The PCAWG study provides a dataset of possible cancer drivers in cancer cell genomes. From over 2,600 whole genome sequences, they were able to identify thousands of SNPs in non-coding sequences by using various computational pipelines and statistical models. However, the study warns that the SNPs they discovered may not all be cancer drivers, since it’s highly unlikely that every identified mutation confers a selective advantage to cancer cells.

This is where Dr. Gersbach’s dataset comes into play. His lab was able to alter specific regions in the non-coding genome to observe the effect it had on the cell. From this, they compiled a list of “essential regulatory elements” in cells, which are regions in the non-coding genome that are critical for cell growth and development.

The hope from intersecting these two datasets will be to find regions of high density overlap, where the SNPs from the PCAWG study coincide with Dr. Gersbach’s “essential regulatory elements”. If these regions exist, it could provide compelling evidence to confirm that the mutations in that region are indeed cancer drivers.

Although my project is only the first step in a long road, the hope is that, by discovering more cancer drivers, we can improve our capabilities to assess an individual’s risk for cancer based on their DNA, which could prove vital for saving lives.

Categories: BSURF 2020

One comment

Ron Grunwald says:

July 13, 2020 at 3:04 pm

Cool. I wonder if you can detail a little about what you mean by “intersect” the two data sets? What methods will allow for a thoughtful way to bring together two different kinds of data? This is always the big challenge of “meta-analysis” – how to blend apples and oranges into a single fruit punch that’s actually better than each on their own. But don’t spill the bins too much! Looking forward to seeing where this goes.

One comment

Leave a Reply Cancel reply