Category Archives: BSURF 2020

Blogs from Summer 2020

Abstract: Searching for Cancer Driver Mutations within Essential Regulatory Enhancers in Non-Coding DNA

Over the past few decades, many cancer-inducing mutations in the human exome have been identified, including but not limited to the BRCA1, BRCA2, and TP53 gene mutations.[1] With the recent commercialization of whole genome sequencing technologies, studies have pivoted towards exploring the possible existence of cancer drivers in non-coding regions of the human genome as well. In January of 2020, the Pan-Cancer Analysis of Whole Genomes (PCAWG) study was published, which utilized 2,658 whole genome sequences of cancer cell lines and their somatic counterparts to identify millions of potential non-coding cancer drivers. However, the identified cancer drivers have yet to be causally linked to any functionality in cell development and growth. This study seeks to intersect the PCAWG driver dataset with a set of essential enhancer elements in cancer cell lines produced by the Gersbach Lab at Duke University. If a high density of drivers from the PCAWG study is observed within the coordinates of the essential enhancers provided by the Gersbach data, it would lend credence to the identified drivers and provide them with functional contextualization. Confirming the existence of non-coding cancer drivers will allow us to deepen our understanding of cancer genetics and provide new frontiers to combat cancer.


[1] Lalloo, Fiona, et al. “BRCA1, BRCA2 And TP53 Mutations in Very Early-Onset Breast Cancer with Associated Risks to Relatives.” European Journal of Cancer, Pergamon, 27 Apr. 2006.

Words From a Nobel Laureate

Over the past couple of weeks, I’ve had the privilege of attending talks given by Duke faculty and administrators. As the sole working BSURF Fellow this year, many of our program talks were cancelled, but I was able to sit in on a variety of talks with the Huang Fellows, Data+ program, and Summer Neuroscience Program.

Of the talks that I attended, I found that Dr. Lefkowitz’s talk was particularly memorable. Dr. Lefkowitz is a Professor of Chemistry and a Professor of Medicine at Duke, and, most notably, he’s also a Nobel laureate. During his talk, he described his journey through science, starting from his childhood all the way until now. As one of the most senior professors at Duke, his experiences were unlike any that I would’ve imagined.  I remember being struck by how Dr. Lefkowitz seemed to have personally experienced the evolution of modern science. Dr. Lefkowitz talked extensively about his network of mentors, which he called his “science family tree”. When looking closely at the diagram he showed us, I was amazed to see names like Niels Bohr and Erwin Schrödinger, who were both mentors of Dr. Lefkowitz’s mentors. In high school, I had revered Bohr and Schrödinger as some of the fathers of modern chemistry, so it was incredible to see how Dr. Lefkowitz had such a close connection to them.

Additionally, when talking about how his own career unfolded, Dr. Lefkowitz repeatedly emphasized how his journey took many unanticipated twists and turns. As someone who won the Nobel Prize in Chemistry, he was adamant that he wouldn’t go into research in the beginning of his career. When he did finally begin conducting formal research, it was well after he had graduated college, and after that, he faced many hurdles in his research. It was both reassuring and inspiring to hear about how someone as accomplished as him also faced so many challenges throughout their journey.

Overall, I really enjoyed the faculty talks this summer. It was amazing to get insight into how some of the most successful people at Duke reached their destinations, and it definitely gave me confidence to pursue my own goals as well.

Until next time, readers!

Access Granted? Not So Fast…

We’ve officially hit the midway point of the BSURF program! One thing that I’ve noticed is that time seems to flow differently when you’re sitting at home. Each day seems to pass slowly, but when looking back, it really feels as though the past 4 weeks have flown by.

For my project, the largest hurdle that we’ve encountered so far is securing data access. Because the PCAWG dataset we’re aiming to use is from a large international study and also contains personal information of patients, there’s a multitude of legal checkpoints that have to be passed in order to gain access to the data. For example, I’ve had to complete multiple CITI trainings and fill out a bunch of legal forms to be approved by Duke as an undergraduate researcher. On Dr. Allen’s end, he’s had to deal with even more forms and applications, and having him navigate the process with me has been hugely helpful. To date, the application process for the data has taken almost two months. The good news is that we’re almost at the end of this long road — the ICGC has informed us that their decision will be released within the next week or so!

In the meanwhile, I’ve been able to take up some side work that’ll hopefully come handy when we’re granted data access. First and foremost, I’ve been able to dive deeper into the literature, which has come with its own challenges, but has been extremely rewarding overall. Additionally, I’ve been able to start exploring some of the computational tools I’ll need to work with the data eventually. From Dr. Gersbach’s side, I’ve been able to play around with some of his VCF files so I’ve gotten more familiar with how to work with genomic data files. Hopefully, these skills will translate into working with the PCAWG data as well, whenever we get it.

Although my project has taken an unexpected delay, it’s very exciting that the end is near. It’s been long awaited, and I’m eager to finally dive into the thick of the project. Until next time!

Dr. Allen’s Journey Through Science

To say that Dr. Andrew Allen’s journey in science has been a whirlwind would be an understatement. 

Dr. Allen began his undergraduate career as a prospective electrical or biomedical engineer. However, in the span of one semester, he quickly realized that his interests laid elsewhere. To him, studying to be an engineer was too regimented. As he puts it, he felt that his studies were only training him to follow certain protocols, rather than challenging him to discover new things and think creatively. 

So, he decided to pursue a math major instead, and graduated with a degree in Mathematics. Following undergrad, Dr. Allen went to graduate school for a Ph.D in Mathematics as well. However, as he neared starting his dissertation, he began to realize that the field of mathematics was already saturated. As he remembers, one time, a job opening that he saw had upwards of 2,000 applicants!

With that, Dr. Allen decided to discontinue his Ph.D studies and explored working in industry instead. He first began work as a hydrologist. However, he felt that his work was constrained to corporate needs, rather than helping the scientific community. So, Dr. Allen decided to apply for a biostatistician role at a medical school. Interestingly enough, he had never had much formal training in biostatistics. As Dr. Allen vividly remembers, he read every single relevant book he could find, which led him to land the job. 

Motivated by this, he went to Emory University and obtained a Ph.D in Biostatistics in 2001. Since then, Dr. Allen has settled down at Duke, where he teaches graduate level courses in biostatistics while also conducting research with his lab.

In about his research, Dr. Allen categorizes his work into three categories: discovery genetics, developing statistical methods for estimation of regulatory effects due to genetic variation, and population genetics. Because of his focus on data analysis, Dr. Allen rarely conducts wet lab research. His specific expertise in developing and improving statistical models has allowed him to collaborate with many other labs, which is something that Dr. Allen really enjoys. Elaborating further, he talked about how he is driven by the thrill of discovering new things, which motivates him to continue to explore the unknown.

When asked about what he cherishes most about his job, he responds that he enjoys how being in academia offers constant opportunities to educate himself. In his own words, he loves being around people who make him feel dumb—not in an harmful way, but in a way that pushes him to keep learning new things.

Finally, when Dr. Allen offered advice to me, his main message was to maintain flexibility in what my interests were and to take advantage of as many opportunities as I could. Second, he recommended that I try to maximize my exposure to all things computational, since it’s becoming increasingly important in every scientific field. 

To me, Dr. Allen’s story not only inspires me to pursue my interests, but also offers reassurance that, if your interests change, it’s okay to start over and make changes. I’m grateful to have a mentor with such a wealth of experience, and I’m excited to forge ahead with whatever the future may hold.

The Other 99% of Our DNA…Is That Where Cancer Is Hiding?

Cancer, at its core, is caused by genetic changes to cells. Whether by environmental, biological, or lifestyle factors, the regions that regulate the growth and development of cells are disrupted such that the cell begins to multiply uncontrollably. With time, the group of cells grow into a tumor, which can then spread throughout the body, stealing resources and causing physical harm to healthy tissue.

A simple schematic of how cancer develops.

Taking it a step further, it stands to reason that some mutations confer a selective advantage to cancer cells, allowing them to out-compete normal cells and outlast our immune system. Otherwise, it would be difficult to explain how cancers spread so aggressively. These mutations are what we call “cancer drivers”, or in other words, mutations that drive cancer.

Already, there has been extensive literature in this field. Scientists have identified numerous genes linked to cancer. However, the vast majority of these discoveries are made in the protein-coding regions of our DNA. There’s likely a lot more to the picture—after all, the coding region only comprises 1% of the entire human genome.

Along with Dr. Allen, I will be trying to identify cancer drivers in the non-coding regions of the human genome. The plan is to intersect two existing datasets: one from the international Pan-Cancer Analysis of Whole Genomes (PCAWG) study, and one from Dr. Charlie Gersbach, who is a Duke faculty in the Department of Biomedical Engineering. 

The PCAWG study provides a dataset of possible cancer drivers in cancer cell genomes. From over 2,600 whole genome sequences, they were able to identify thousands of SNPs in non-coding sequences by using various computational pipelines and statistical models. However, the study warns that the SNPs they discovered may not all be cancer drivers, since it’s highly unlikely that every identified mutation confers a selective advantage to cancer cells.

This is where Dr. Gersbach’s dataset comes into play. His lab was able to alter specific regions in the non-coding genome to observe the effect it had on the cell. From this, they compiled a list of “essential regulatory elements” in cells, which are regions in the non-coding genome that are critical for cell growth and development. 

The hope from intersecting these two datasets will be to find regions of high density overlap, where the SNPs from the PCAWG study coincide with Dr. Gersbach’s “essential regulatory elements”. If these regions exist, it could provide compelling evidence to confirm that the mutations in that region are indeed cancer drivers.

Although my project is only the first step in a long road, the hope is that, by discovering more cancer drivers, we can improve our capabilities to assess an individual’s risk for cancer based on their DNA, which could prove vital for saving lives. 

Let’s Science From Home!

If things were normal, this would be Week 5 of BSURF. But, a pandemic and a summer class later, here I am, sitting at home, typing up my first blog post. Crazy, isn’t it?

This year’s BSURF program will be drastically different from previous years. Instead of being on campus with the other BSURF fellows, I’ll be working remotely for the next two months, going to Zoom meetings instead of going into the lab. Not all of it is bad news though. There are certain luxuries that come with working from home, such as setting your own schedule, spending time with family, and being able to eat (!) while working. 

Before I get further, a little about my project: under the guidance of Dr. Allen, my PI and mentor,  I’ll be deep-diving into the datasets of two separate studies. The first is the Pan-Cancer Analysis of Whole Genomes (PCAWG) study, which is an international collaboration that yielded groundbreaking research on cancer drivers in non-coding regions of the human genome. In essence, they took over 2,600 whole genome sequences of various cancer types and located potential cancer drivers (read: mutations that cause cancer) in non-coding regions. The second study is by Dr. Gersbach, who is a professor at Duke. His lab recently produced a dataset of “essential elements” in the human genome that regulate normal cell growth. My goal is to intersect these two datasets to see if there are any common hits between them. To put it simply, we’re looking for cause and effect. If, for example, there is a certain essential element that contains many cancer drivers, it would be the first step to confirming the validity of that cancer driver.  Eventually, the hope is to map non-coding cancer drivers to a phenotypic impact.

For the rest of this summer, I have two goals in mind. First, I want to become comfortable with research and the adventure it represents. In more objective terms, I hope to become familiar with the computational knowledge and techniques required to do research in the field of genetics. More importantly, I want to be able to embrace the daunting challenges and uncertainties that come with doing research. Second, I hope to build meaningful and long-lasting relationships with mentors and peers alike. I’m excited to be open-minded and have conversations, whether that’s about research projects, or about anything, really.

Already, research has proven to be an eye-opening experience. Whether it’s deep-diving into literature or emailing for help, I’ve quickly learned research is by no means a linear process. In reading my first paper, I had to read up on three other ones just to understand what was going on. And that’s what I’m coming to really enjoy. There are endless paths to choose from and avenues to explore, and while there will certainly be challenges to embrace, I’ll be ready to adapt, readjust, and push ahead as readily as ever.

Stay tuned, and welcome to the blog!

My workspace from home!