EXTRACTING TRANSMISSION NETWORKS FROM PHYLOGEOGRAPHIC DATA FOR EPIDEMIC AND ENDEMIC DISEASES: EBOLA VIRUS IN SIERRA LEONE, 2009 H1N1 PANDEMIC INFLUENZA AND POLIO IN NIGERIA

March 1, 2015

Abstract: 

Background: Phylogeography improves our understanding of spatial epidemiology. However, application to practical problems requires choices among computational tools to balance statistical rigor, computational complexity, sensitivity to sampling strategy and interpretability.

Methods: We introduce a fast, heuristic algorithm to reconstruct partially-observed transmission networks (POTN) that combines features of phylogenetic and transmission tree approaches. We compare the transmission network generated by POTN with existing algorithms (BEAST and SeqTrack), and discuss the benefits and challenges of phylogeographic analysis on examples of epidemic and endemic diseases: Ebola virus, H1N1 pandemic influenza and polio.

Results: For the 2014 Sierra Leone Ebola virus outbreak and the 2009 H1N1 outbreak, all three methods provide similarly plausible transmission histories but differ in detail. For polio in northern Nigeria, we discuss performance trade-offs between the POTN and discrete phylogeography in BEAST and conclude that spatial history reconstruction is limited by under-sampling.

Figure 2

Three methods for phylogeographic reconstruction of the initial phase of the 2014 Ebola virus outbreak in Sierra Leone. (A) Maximum clade credibility phylogenetic tree; cases (tips) labeled by color according to location, as shown on the map in panel B; inset: histogram of pairwise genetic distances. The earliest cases are in Kissi Teng, Kailahun (red), and the majority of cases prior to 19 June 2014 were in Jawie, Kailahun (blue). (B) Chiefdom colormap of Sierra Leone, location of cases analysed in panel A are depicted in corresponding colors on the map. (C) Partially-observed transmission network (POTN): cases labeled by color according to location, gray lines indicate POTN links between case pairs, thick gray lines indicate the parsimonious transmission tree representing a single consistent ancestry that results from pruning to keep only the ancestral link to each case with the shortest duration. (D) SeqTrack minimum spanning tree: cases labeled by color according to location, gray lines indicate POTN links between case pairs. (E) BEAST discrete phylogeography, maximum clade credibility tree, projected as a transmission network: cases labeled by color according to location with internal nodes colored by highest posterior probability location, gray lines indicate POTN links between case pairs.

 

Conclusions : POTN is complementary to available tools on densely-sampled data, fails gracefully on under-sampled data and is scalable to accommodate larger datasets. We provide further evidence for the utility of phylogeography for understanding transmission networks of rapidly evolving epidemics. We propose simple heuristic criteria to identify how sampling rates and disease dynamics interact to determine fundamental limitations of phylogeographic inference.