In summary, the low mutation rate of M. tuberculosis means that even at the highest resolution provided by whole genome sequencing it is still difficult to confidently affirm the inferences of transmission made by traditional epidemiological techniques. This means it is very difficult to determine transmission inclusively. However, whole genome sequencing does in some cases allow us to exclude direct transmission, by using the phylogenetic context provided by other strains. Not only does whole genome sequencing provide the inter-cluster differentiation provided by current typing methods, but it also achieves intra-cluster resolution which can be used to inform epidemiological investigation.
At the cluster level we were able to estimate an average mutation rate of 0.3 SNPs per genome per year, which is remarkably close to estimates made by Ford et al. . This confirms the extremely low rate of accumulation of variation that characterizes M. tuberculosis, which is approximately 3 times and 44 times slower than that observed in Escherichia coli and Staphylococcus aureus, respectively . It is possible that lack of coverage and mapping to repetitive regions and members of PE and PPE gene families may have resulted in a lower estimated mutation rate. However, it is worth noting that only 5% of the genome on average was unrepresented in this analysis so any effects on the estimated mutation rate are likely to be negligible.
We were unable to detect a clock-like signal at a larger phylogenetic scale (at the lineage level), probably reflecting the different processes of fixation and substitution having variable influences on different parts of the evolutionary history. However, at the intra-cluster level and between the epidemiologically linked pairs, we observed a large level of variation around the mean, which is in contrast to observations of some other bacteria [38, 39]. There are a variety of factors that may have contributed to this variability. Latency is common in tuberculosis infection and could result in considerable discrepancies in the apparent rate of mutation over time. However, Ford et al. recently showed that the mutation rate during latency in macaques was similar to that during active infection . Within host selection for factors such as drug resistance may also result in variation in the accumulation of SNPs over time, as observed for a single patient . However, we propose that the most important factor is probably the low mutation rate itself, meaning over short time scales only a weak signal of a molecular clock can be detected.
This variability means that although a molecular clock may be detectable over longer time frames, it is only an aggregate measure and should be used with extreme caution when applying it to infer local transmission or date recent evolutionary events. Furthermore, while mutation rate can be used to strengthen or exclude epidemiological links, it cannot be used alone to infer direct transmission, particularly due its slow rate. We found no evidence of hyper-mutation in our dataset, and this has not been reported in clinical M. tuberculosis isolates to our knowledge. However it is possible that treatment may impose selection pressures on isolates that could affect the observed rate of fixation, and this should be considered.
In a recent investigation of tuberculosis transmission chains in the UK , an estimate of 0.5 SNPs per genome per year was derived using whole genome sequences from both within-patient and within-household longitudinal sampling. This estimate was based on the number of SNPs accumulated between the first and last sequenced isolate from each patient or transmission chain. Interestingly, the estimate for within-patient mutation rate was identical to our estimate of 0.3 SNPs per genome per year. The higher between-patient rate they estimated may reflect the multiple sources of error when using the inferred epidemiological links to derive mutation rates, which we observed in our dataset. They used this estimate to set a cut-off of ≤5 SNPs for cases less than three years apart. Our more in-depth analysis of the molecular clock reveals a high level of variation which suggests that using a simple cut-off may not be entirely accurate for confirming transmission, but that the phylogenetic context provided by deep sampling of clusters may be more informative.