Genomic epidemiology to combat virus outbreaks

By Nathan Grubaugh, PhD (@NathanGrubaugh)

Post-doctoral sequencing ninja in the Andersen Lab, The Scripps Research Institute, La Jolla, CA

We are constantly bombarded with news headlines about some deadly virus lurking in our back yard, ready to spring at moments notice. Certainly some of this is just fear-mongering serving as click bait, but many of these headlines are justified. And it is mostly our own fault. Sure, viruses can mutate, changing the way that they behave, but that is not what really makes them suddenly emerge. It is our modern societies, encroaching on new territories, developing large urban centers, and connecting distant parts of the world, that is creating the perfect recipe for pandemics. Since these activities will likely increase, severe disease outbreaks caused by viruses such as influenza, Ebola, and Zika will stay as fixtures in the news. It is up to scientists to develop new tactics to make them obscure.

Epidemiologist are at the front lines of our battles with infectious diseases and must often employ the latest technology to improve outbreak response. While going door-to-door is still essential for contact tracing, many scientists also use sophisticated models to estimate some of the unknown factors influencing virus transmission. Now epidemiologists are adding genomics to their array of tools.

The virus’ history is written in its genome. Sequencing many virus genomes from an outbreak and determining their relationships can reveal the pathways of transmission. Dudas et al. provides an excellent example. By analyzing >1600 Ebola virus genomes – the largest dataset ever analyzed from an outbreak – they reconstructed the pathways that virus followed during the 2013–2016 West African epidemic (see video). If that is not impressive enough, Worobey et al. sequenced virus genomes to uncover how HIV-1 entered North America in the 1970’s and proved that the so called “Patient 0” was not the cause. This field of study has been coined genomic epidemiology, and is critical for our future outbreak control efforts.

Time lapse of the Ebola epidemic in West Africa. Left panel shows local transmission intensity (circle sizes) and virus spread (vectors). Top right is the phylogenetic tree colored by country and bottom right are the case numbers. Guinea = green, Seirra Leone = blue, Liberia = red. Created by Gytis Dudas (@evogytis).


I consider myself as a virologist, not a genomic epidemiologist. And actually, I am not even sure if there is anyone that could be defined as a true “genomic epidemiologist”. The field itself consists of scientists who specialize in different areas, but come together for a common cause. You need clinicians and epidemiologists reporting on the ground, in some cases entomologists collecting mosquitoes, molecular biologists sequencing the viruses, computational biologists crunching the numbers, and geneticists connecting the dots. On top of that, there are often many people with grey-area skill sets bridging the gaps. The outstanding Dudas et al. paper had >90 authors. So to say the least, this field would not exist without incredible cooperation and collaboration.

Novel and important scientific findings are often not discovered in isolation, especially in modern times. Human activities, such as urbanization and globalization, are generating conditions for explosive outbreaks. In response, we must combine ideas, expertise, resources, and technology to have the greatest impact. Genomic epidemiology provides a unique example of how the field is swiftly changing competitors into collaborators. Not only is this the right thing to do for science and society, but it is a certain benefit to your career. Here is the secret: My current advisor, Kristian Andersen, is an open-data-sharing junky. He has shown me that making data and ideas publicly available before publication 1) gets the important information disseminated immediately and 2) gets the attention of other interested scientists. Sometimes this can lead to your data getting “scooped” (i.e. published before you do), but 9 out of 10 times it gets smart people that can do something awesome with your data wanting to collaborate. This makes your projects better, exposes you to new fields of science, and expands your professional network. I like those odds.

My first genomic epidemiology project was to investigate the Zika virus outbreak in Florida. The crux of the project is to be able to sequence the virus’ genome, and for Zika virus, this was not easy. The main factor was that there just is not much viral RNA present in human blood to sequence, so the workhorse of molecular biologists, RNAseq (sequencing all of the RNA present without bias), did not produce much usable data. Something targeted needed to be developed. This is where I first encountered the true collaborative spirit of the field. Nick Loman and Josh Quick, two sequencing gurus and field pioneers, generated a brilliantly easy and efficient protocol, and shared it with the world. Even though we only really knew Nick and Josh through Twitter, we teamed up to adapt the protocol to the two most popular sequencing machines, the Oxford Nanopore minION and Illumina MiSeq. Armed with a new tool, teams around the world were now rapidly generating Zika virus data, and for the most part, openly sharing their data for the larger community to use. All available Zika sequencing data is displayed on NextStrain, which recently won the Open Science Prize.  

Honestly, this project would not have happened without Sharon Isern and Scott Michael at Florida Gulf Coast University. They were our connections to the outbreak in Florida, retrieved samples from the Department of Health, tested mosquitoes for Zika virus, and shared all of this with us. Once we started generating and sharing our Zika virus genetic data another wave of collaboration began again. Groups were willing to share their case investigations (e.g. infection locations), transmission models (e.g. infection risk), and travel data (e.g. number of passengers) so that we could collectively make the greatest impact on public health.

We also discovered through (a forum for sharing virus genetic data during outbreaks) that other groups were independently sequencing Zika virus from the Florida outbreak. While competition in science can be beneficial in some respects, it is mostly harmful when accurate information is urgently needed, like during an outbreak. Therefore, we partnered with Jason Ladner and Gus Palacios from the US Army Medical Research Institute of Infectious Disease and Pardis Sabeti at The Broad to put together a more comprehensive data set. Our manuscript was recently published in Nature.

Central to our story was determining the relationships among the Zika virus genomes that we sequenced from Florida and those sequenced from other places in the Americas. Those relationships are displayed in the form of a phylogenetic tree (see Figure 1, Zika virus from Florida is shown in red). The points where the tree branches merge and the number of distinct red Florida groups (called clades) indicate when and the number of times the virus arrived, respectively. With this basic information, we determined that Zika virus transmission started in Florida 2-3 months before it was first detected and at least 4 separate introductions (but as many as 40 based on our models) may have contributed to the outbreak. The Zika virus genomes that sit closest to the Florida genomes in the tree are mostly from the Caribbean. Combining the sequence data with Miami being a major travel hub and intense Zika virus outbreak occurring on many popular vacation hot spots, we hypothesized that the Caribbean Islands are a significant source of Zika virus introductions into Florida. Once transmission started, we discovered that the outbreak within Miami was likely more widespread than what was previously thought. On a positive note, we found evidence that the intense mosquito control programs helped to end the Zika virus outbreak.



Figure 1. Time-resolved phylogenetic tree of Zika virus in the Americas. The tips are positioned on the sample collection date (x-axis) and the nodes (where branches converge back in time) represent the most recent common ancestor. The ancestors can indicate when a virus emerged into a new region. Image was created using Nextstrain.

We also joined up with two other teams investigating the spread of Zika virus in the Americas. Again, instead of competing for the spotlight, we co-submitted our work so that the collective story would have a greater impact. Metsky et al. generated >100 new Zika virus genomes to analyze the timing and patterns of introductions across the Americas. Faria et al. used a mobile genomics lab to sequence Zika virus from Brazil to discover that transmission occurred unnoticed for more than a year prior to detection. The importance of all of these papers being published together was to show how the Zika virus epidemic expanded in the Americas. It was not just a uniform wave, spreading to one bordering country to the next, but a series of large jumps followed by intense local transmission (Figure 2).


Figure 2. Zika virus spread in the Americas based on the phylogenetic tree presented in Figure 1. The color scheme is from Figure 1. The circle diameter represents the number of Zika virus genomes collected from a country. Arrowed lines indicate directionality of spread. Under-sampling of Zika virus genomes prevents us from deciphering more precise movements. Image was created using Nextstrain.

Our Zika virus studies demonstrated how vulnerable we are to epidemics of unexpected viruses. By the time Zika virus was first discovered in Brazil, we estimated that it had already spread to most of the Americas. At that point, no control efforts in Brazil would have stopped the epidemic. However, the patterns that we discovered using genomic epidemiology can help to inform policy and direct control efforts. For example, we now know that if we want to prevent future Zika virus outbreaks in Florida, we need to devote resources to combating the outbreaks in the Caribbean. Moreover, we provided evidence that local transmission can be reduced with intense mosquito control campaigns. This is tangible data that can be acted upon, only made possible by many friends, colleagues, and total strangers finding a common ground. While there are many other groups freely open to collaboration and data sharing (e.g. the Zika experimental science team), much of the Zika research has been a race to publish something first (see the story of antibody-dependent enhancement). The problem is that the first is not always the best and our literature is now muddied with incomplete findings. I am becoming increasingly worried that this sort of guarded, competitive research will undermine our response efforts. How well we are able to come together, in my opinion, will ultimately decide the fate of this epidemic and the next.


One thought on “Genomic epidemiology to combat virus outbreaks

  1. Pingback: Zika papers in Nature | Andersen Lab | TSRI | STSI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s