Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003
© Vega et al; licensee BioMed Central Ltd. 2004
Received: 18 May 2004
Accepted: 06 September 2004
Published: 06 September 2004
The SARS coronavirus is the etiologic agent for the epidemic of the Severe Acute Respiratory Syndrome. The recent emergence of this new pathogen, the careful tracing of its transmission patterns, and the ability to propagate in culture allows the exploration of the mutational dynamics of the SARS-CoV in human populations.
We sequenced complete SARS-CoV genomes taken from primary human tissues (SIN3408, SIN3725V, SIN3765V), cultured isolates (SIN848, SIN846, SIN842, SIN845, SIN847, SIN849, SIN850, SIN852, SIN3408L), and five consecutive Vero cell passages (SIN2774_P1, SIN2774_P2, SIN2774_P3, SIN2774_P4, SIN2774_P5) arising from SIN2774 isolate. These represented individual patient samples, serial in vitro passages in cell culture, and paired human and cell culture isolates. Employing a refined mutation filtering scheme and constant mutation rate model, the mutation rates were estimated and the possible date of emergence was calculated. Phylogenetic analysis was used to uncover molecular relationships between the isolates.
Close examination of whole genome sequence of 54 SARS-CoV isolates identified before 14th October 2003, including 22 from patients in Singapore, revealed the mutations engendered during human-to-Vero and Vero-to-human transmission as well as in multiple Vero cell passages in order to refine our analysis of human-to-human transmission. Though co-infection by different quasipecies in individual tissue samples is observed, the in vitro mutation rate of the SARS-CoV in Vero cell passage is negligible. The in vivo mutation rate, however, is consistent with estimates of other RNA viruses at approximately 5.7 × 10-6 nucleotide substitutions per site per day (0.17 mutations per genome per day), or two mutations per human passage (adjusted R-square = 0.4014). Using the immediate Hotel M contact isolates as roots, we observed that the SARS epidemic has generated four major genetic groups that are geographically associated: two Singapore isolates, one Taiwan isolate, and one North China isolate which appears most closely related to the putative SARS-CoV isolated from a palm civet. Non-synonymous mutations are centered in non-essential ORFs especially in structural and antigenic genes such as the S and M proteins, but these mutations did not distinguish the geographical groupings. However, no non-synonymous mutations were found in the 3CLpro and the polymerase genes.
Our results show that the SARS-CoV is well adapted to growth in culture and did not appear to undergo specific selection in human populations. We further assessed that the putative origin of the SARS epidemic was in late October 2002 which is consistent with a recent estimate using cases from China. The greater sequence divergence in the structural and antigenic proteins and consistent deletions in the 3' – most portion of the viral genome suggest that certain selection pressures are interacting with the functional nature of these validated and putative ORFs.
The Severe Acute Respiratory Syndrome (SARS) was first reported in November 2002 and rapidly spread to a number of distant global regions by early 2003. A new coronavirus, the SARS-CoV, was identified to be the cause of SARS [1, 2] and was rapidly sequenced and characterized [3, 4]. SARS-CoV is an enveloped, positive strand RNA virus with a wide host range. Recombination and mutation rates of RNA viruses are high, several orders of magnitude higher than DNA based microbes and in eukaryotes, and have been the cause of rapid changes in antigenicity, virulence, and drug sensitivity. Thus, the direct estimate of the mutation rates of the SARS-CoV in human populations and the analysis of the mutational spectrum would aid in developing strategies for monitoring and therapy.
Previously, our analysis of 14 SARS sequences (five of which originated from Singapore) in May 2003 indicated that there are two different genotypes circulating in the world . Recently, there has been a substantial increase in the number of SARS-CoV genomes sequenced. A total of 54 SARS-CoV genomic sequences (37 from the public database prior to October 14, 2003 and 17 sequenced within our institute) are used in our current analysis. This large dataset coupled with the availability of clinical data for cases related to Singapore patients and our molecular observations during in vitro cell passage presents an opportunity for a comprehensive analysis of the SARS-CoV mutational behavior.
Viral RNA genome isolation and sequencing
SARS-CoV from the primary patient tissues were isolated by homogenizing the tissues in PBS buffer followed by a low speed centrifugation to obtain the viral particle containing supernatant. The virus-containing samples were also inoculated into Vero cell E6. The cells were maintained at 37°C using the usual viral cell culture media, and repassaged after 7 days of incubation. The virus-containing supernatants of homogenize or different passages of Vero cell E6 showing CPE were centrifuged at 23,000 RCF for 2.5 hours to pellet the viral particles and followed by RNA extraction using the QiAmp viral RNA mini kit (Qiagen, http://www.qiagen.com). The RNA genome templates were converted into double strand cDNA and sequenced as previously described . The processing of raw sequence reads (base calling, assembly, and editing) was done using PHRED/PHRAP/CONSED (University of Washington, Seattle, WA, USA, http://www.phrap.org).
Genotype determination using MassArray technology
A number of single nucleotide variations (SNVs) were further confirmed using a sensitive Mass Spectrometry based genotyping assay that was developed within our institute . The RNA of the virus was first isolated using QiAmp viral RNA mini kit and then reverse-transcribed into cDNA (using the RNA as template, SuperScript kit from Invitrogen, and sequence specific primers), which were further purified. Primer extension assays were carried out for the SNVs of interest. The extension products were then detected in the MassARRAY (from Sequenom) to determine the genotypes.
Data and statistical analysis
We aligned the 54 SARS-CoV genomes using CLUSTALW . To minimize the effect of sequencing errors and other artefacts to our analysis, we employed a filtering scheme where only SNVs shared by more than two different isolates are kept. The phylogenetic trees were reconstructed using the filtered variations. The reconstruction was done using PAUP*  with Maximum Likelihood criterion, keeping the other parameters to the default.
The significance of the variations that pass the proposed mutation filter (where only mutations shared by more than 2 out of 54 isolates are considered real) can be assessed by calculating the probability that a random noise would meet the filtering criterion. The null hypothesis is that the noisy variations are generated independently between genomes. Let q be the rate of noisy mutation in a genome (based on our findings, as reported earlier in the text, we conservatively set , i.e. about two per SARS-CoV genome). The probability that, at a given nucleotide, a noisy mutation is shared by exactly i out of m different isolates is . Thus, the probability that a given nucleotide position has an erroneous mutation shared by more than k isolates is . In a genome with n bases, applying the Bonferroni inequalities, the probability that at least one position is corrupted by noise more than k times is p(k,n,m) ≤ n × s(k,m). In the case of 54 SARS-CoV genomes analyzed in this paper, , m = 54, k = 2, n = 30000, and hence the p-value of mutations that satisfy the filter is ≤ 2.2 × 10-4.
Another pertinent question in the analysis of SARS-CoV evolution is prediction of the possible date of origin of the human SARS-CoV. Based on the animal-origin hypothesis of SARS-CoV, we assumed the SARS-CoV isolated from palm civet cat as the putative principal isolate that infected the human population. Adhering to the constant mutation rate model, we fit the following model: d x = d 0 + kx, where k is the daily rate of mutation, x is the sampling date measured relative to 1st November 2002, and d x is the number of mutations, as compared to the civet cat isolate, of the isolate sampled at date x. Twelve data points were calculated and used to fit the model. The date of origin can be solved by solving x for d x = 0. The goodness-of-fit was measured using the adjusted R-square statistic.
Results and discussion
SARS-CoV mutations in vitro
Table 1: Single nucleotide heterogeneity (SNH) observed in the six passages of Vero cells culture. Along the six Vero cell passages, nucleotide heterogeneity was observed (initially through capillary sequencing and confirmed using MassARRAY genotyping) at nucleotide position 18356. Presence of single nucleotide heterogeneities (SNHs) indicates coexistence of multiple SARS-CoV isolate in the Vero cell culture.
Genomic Location (Based on SIN2774)
135aa of nuclease ExoN homolog [R/G]
Mutations associated with human-to-Vero and Vero-to-human transition
Quasispecies fluctuations and mutations during the transition from human tissue to Vero cell culture. Nucleotide variations observed between primary human tissue isolates and their respective subsequent Vero cell culture. Both quasispecies selection (SIN3275V → SIN849M) and new emergence (AS → HSR1) are observed during the transmission of SARS-CoV from human tissue sample into Vero cell culture.
Source of viral sequence: Human Tissue
Source of viral sequence: Passage to Vero cell culture
ORF (based on NC_004718.3)
Y [t/C] → T
95aa of Leader protein (I, T → I)
Y [T/C] → C
131aa of NSP10 (SILENT)
G → R [G/A]
637aa of sars6 (D → D, N)
Singapore encountered an unusual incident where a stable lab SARS-CoV isolate commonly used for in vitro experimentation accidentally infected a laboratory worker . We sequenced both the originating laboratory isolate (SIN_WNV; see Table S1) and the viral sample directly from the patient's sputum (SIN0409; see Table S2) and found no sequence difference between the two viruses. This reconfirms that the mutation rate from a single point source of virus has a low mutation rate when expanded during human infection.
Sequence variation filter
Molecular history of the viral isolates
Estimation of the mutation rate of the SARS-CoV
We obtained the precise dates of symptom onset of 13 Singaporean cases (Table S1). Using the common mutations identified through application of the mutation filter, we employed the constant mutation rate model and estimated the mutation rate of the SARS-CoV during this recent epidemic. We estimated the mutation rate to be 0.1722 nucleotides per day, or 5.7 × 10-6 nucleotide substitutions per site per day (adjusted R-square value of the fitted model = 0.4014). The rates for synonymous and non-synonymous mutations were equivalent at 2.5 × 10-6 and 3.2 × 10-6 nucleotide substitutions per site per day respectively. Using the Singapore isolates with known date of onset, and using the SZ3 and SZ16 genomes isolated from palm civet cat  as the putative "original" SARS-CoV that jumped from animal to human, we calculated the daily substitution rate to be 0.1303 nucleotides per day, or 4.3 × 10-6 nucleotide substitutions per site per day, (adjusted R-square = 0.5880) and the estimated possible "date" of SZ3/SZ16 emergence was Oct 21, 2002. Overall, the mutation rate of SARS-CoV appears to be consistent with the reported rate of other viruses [13, 14].
The focus of this investigation was to measure the mutational frequency and dynamics both in vitro and in vivo of the SARS-CoV. Our findings suggest that the overall SARS-CoV's rate of mutation in culture is low. Inoculation of Human SARS-CoV into Vero cell introduces, on the average, less than one nucleotide mutation. Subsequent culturing of SARS-CoV infected Vero cells induced less than one nucleotide mutation in the five consecutive Vero cell passages. No mutations were also observed during the infection of SARS-CoV cultured in Vero cell to human. This would be consistent with the notion that the SARS-CoV isolates from the patients that gave rise to the in vitro lines are well adapted for in vitro growth.
Nucleotide variations during the transition from Vero cell culture to human. No nucleotide variations were observed between isolate from laboratory-acquired SARS patient and its infection source.
Source of viral sequence: Vero cell culture
Source of viral sequence: Passage to Human Tissue
Number of stable nucleotide substitutions
We want to express their appreciation to Mr. Thoreau Herve, Mr. Landri Lim, Ms. Carine Bonnard, Mr. Meah Wee Yang, and Ms. Lin Su for providing technical support, and Mr. Chia Jer Ming for assisting the ORF analysis. This study was supported by the Agency for Science, Technology, and Research of Singapore, and the Biomedical Research Council of Singapore. The authors wish to express their appreciation to Mr. Thoreau Herve, Mr. Landri Lim, Ms. Carine Bonnard, Mr. Meah Wee Yang, and Ms. Lin Su for providing technical support, and Mr. Chia Jer Ming for assisting the ORF analysis. This study was supported by the Agency for Science, Technology, and Research of Singapore, and the Biomedical Research Council of Singapore.
- Ksiazek TG, Erdman D, Goldsmith CS, Zaki SR, Peret T, Emery S, Tong S, Urbani C, Comer JA, Lim W, Rollin PE, Dowell SF, Ling AE, Humphrey CD, Shieh WJ, Guarner J, Paddock CD, Rota P, Fields B, DeRisi J, Yang JY, Cox N, Hughes JM, LeDuc JW, Bellini WJ, Anderson LJ, SARS Working Group: A Novel Coronavirus Associated with Severe Acute Respiratory Syndrome. New England Journal of Medicine. 2003, 348: 1953-1966. 10.1056/NEJMoa030781.View ArticlePubMedGoogle Scholar
- Drosten C, Gunther S, Preiser W, van der Werf S, Brodt HR, Becker S, Rabenau H, Panning M, Kolesnikova L, Fouchier RA, Berger A, Burguiere AM, Cinatl J, Eickmann M, Escriou N, Grywna K, Kramme S, Manuguerra JC, Muller S, Rickerts V, Sturmer M, Vieth S, Klenk HD, Osterhaus AD, Schmitz H, Doerr HW: Identification of a Novel Coronavirus in Patients with Severe Acute Respiratory Syndrome. New England Journal of Medicine. 2003, 348: 1967-1976. 10.1056/NEJMoa030747.View ArticlePubMedGoogle Scholar
- Marra MA, Jones SJ, Astell CR, Holt RA, Brooks-Wilson A, Butterfield YS, Khattra J, Asano JK, Barber SA, Chan SY, Cloutier A, Coughlin SM, Freeman D, Girn N, Griffith OL, Leach SR, Mayo M, McDonald H, Montgomery SB, Pandoh PK, Petrescu AS, Robertson AG, Schein JE, Siddiqui A, Smailus DE, Stott JM, Yang GS, Plummer F, Andonov A, Artsob H, Bastien N, Bernard K, Booth TF, Bowness D, Czub M, Drebot M, Fernando L, Flick R, Garbutt M, Gray M, Grolla A, Jones S, Feldmann H, Meyers A, Kabani A, Li Y, Normand S, Stroher U, Tipples GA, Tyler S, Vogrig R, Ward D, Watson B, Brunham RC, Krajden M, Petric M, Skowronski DM, Upton C, Roper RL: The Genome Sequence of the SARS-Associated Coronavirus. Science. 2003, 300 (5624): 1399-1404. 10.1126/science.1085953.View ArticlePubMedGoogle Scholar
- Rota PA, Oberste MS, Monroe SS, Nix WA, Campagnoli R, Icenogle JP, Penaranda S, Bankamp B, Maher K, Chen MH, Tong S, Tamin A, Lowe L, Frace M, DeRisi JL, Chen Q, Wang D, Erdman DD, Peret TC, Burns C, Ksiazek TG, Rollin PE, Sanchez A, Liffick S, Holloway B, Limor J, McCaustland K, Olsen-Rasmussen M, Fouchier R, Gunther S, Osterhaus AD, Drosten C, Pallansch MA, Anderson LJ, Bellini WJ: Characterization of a Novel Coronavirus Associated with Severe Acute Respiratory Syndrome. Science. 2003, 300 (5624): 1394-1399. 10.1126/science.1085952.View ArticlePubMedGoogle Scholar
- Ruan YJ, Wei CL, Ee AL, Vega VB, Thoreau H, Su ST, Chia JM, Ng P, Chiu KP, Lim L, Zhang T, Peng CK, Lin EO, Lee NM, Yee SL, Ng LF, Chee RE, Stanton LW, Long PM, Liu ET: Comparative Full-length Genome Sequence Analysis of 14 SARS Coronavirus Isolates and Common Mutations Associated with Putative Origins of Infection. Lancet. 2003, 361 (9371): 1779-1785. 10.1016/S0140-6736(03)13414-9.View ArticlePubMedGoogle Scholar
- Liu JJ, Lim SL, Ruan Y, Ling A, Drosten C, Liu ET, Stanton LW, Hibberd ML: SARS-CoV Transmission Epidemiology Revealed by MALDI-TOF Mass Spectrometry-based Viral Genotyping:. 2004, submittedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL-W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research. 1994, 22: 4673-4680.View ArticlePubMedPubMed CentralGoogle Scholar
- Swofford DL: PAUP: Phylogenetic Analysis Using Parsimony (and Other Methods) Version 4. 2003, Sunderland, Massachusetts: Sinauer AssociatesGoogle Scholar
- Nei M, Kumar S: Molecular Evolution and Phylogenetics. 2000, Oxford University Press, OxfordGoogle Scholar
- Lim PL, Kurup A, Gopalakrishna G, Chan KP, Wong CW, Ng LC, Se-Thoe SY, Oon L, Xinlai Bai X, Stanton LW, Ruan Y, Miller LD, Vega VB, James L, Ooi PL, Kai CS, Olsen SJ, Ang B, Leo YS: Laboratory-acquired Severe Acute Respiratory Syndrome (SARS) – Singapore. New England Journal of Medicine. 2004,Google Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research. 1998, 8 (3): 186-194.View ArticlePubMedGoogle Scholar
- Guan Y, Zheng BJ, He YQ, Liu XL, Zhuang ZX, Cheung CL, Luo SW, Li PH, Zhang LJ, Guan YJ, Butt KM, Wong KL, Chan KW, Lim W, Shortridge KF, Yuen KY, Peiris JS, Poon LL: Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science. 2003, 302 (5643): 276-8. 10.1126/science.1087139.View ArticlePubMedGoogle Scholar
- Drake JW, Holland JJ: Mutation Rates Among RNA Viruses. PNAS. 1999, 96: 13910-13913. 10.1073/pnas.96.24.13910.View ArticlePubMedPubMed CentralGoogle Scholar
- Li WH, Tanimura M, Shrap PM: Rates and Dates of Divergence Between AIDS Virus Nucleotide Sequences. Molecular Biology and Evolution. 1988, 5 (4): 313-330.PubMedGoogle Scholar
- Yeh SH, Wang HY, Tsai CY, Kao CL, Yang JY, Liu HW, Su IJ, Tsai SF, Chen DS, Chen PJ, Chen DS, Lee YT, Teng CM, Yang PC, Ho HN, Chen PJ, Chang MF, Wang JT, Chang SC, Kao CL, Wang WK, Hsiao CH, Hsueh PR: Characterization of Severe Acute Respiratory Syndrome Coronavirus Genomes in Taiwan: Molecular Epidemiology and Genome Evolution. PNAS. 2004, 101: 2542-2547. 10.1073/pnas.0307904100.View ArticlePubMedPubMed CentralGoogle Scholar
- The Chinese SARS Molecular Epidemiology Consortium: Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China. ScienceExpress. 2004, 10.1126-Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2334/4/32/prepub
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.