Skip to main content

Near complete genome sequences from Southern Vietnam revealed local features of genetic diversity and intergenerational changes in SARS- CoV-2 variants in 2020–2021

A Correction to this article was published on 21 December 2023

This article has been updated



Since its beginnings in 2019, the COVID-19 pandemic is still a problem of global medical concern. Southern Vietnam is one of the country's vast regions, including 20 provinces and the densely populated metropolis Ho Chi Minh City. A randomized retrospective study was performed to investigate the epidemiology and genetic diversity of COVID-19. Whole-genome sequencing of 126 SARS-CoV-2 samples collected from Southern Vietnam between January 2020 and December 2021 revealed the main circulating variants and their distribution.


Epidemiological data were obtained from the Department of Preventive Medicine of the Vietnamese Ministry of Health. To identify circulating variants, RNA, extracted from 126 nasopharyngeal swabs of patients with suspected COVID-19 were sequenced on Illunina MiSeq to obtain near complete genomes SARS-CoV-2.


Due to the effectiveness of restrictive measures in Vietnam, it was possible to keep incidence at a low level. The partial relaxation of restrictive measures, and the spread of Delta lineages, contributed to the beginning of a logarithmic increase in incidence. Lineages 20A-H circulated in Southern Vietnam during 2020. Spread of the Delta lineage in Southern Vietnam began in March 2021, causing a logarithmic rise in the number of COVID-19 cases.


Pandemic dynamics in Southern Vietnam feature specific variations in incidence, and these reflect the success of the restrictive measures put in place during the early stages of the pandemic.

Peer Review reports


In late December 2019, a group of patients with pneumonia of unknown etiology was reported in Wuhan, Hubei province, China [1]. From there, it spread massively to all 34 provinces of China. The number of confirmed cases increased rapidly, and thousands of new cases were identified daily. The highest average newly-confirmed cases per day reached 3,000 [2]. Amidst this situation, the WHO declared the novel coronavirus outbreak a 'public health emergency of international concern' [3]. On February 11, the International Committee on Taxonomy of Viruses designated the novel coronavirus ‘SARS-CoV-2', and associated illness was designated 'COVID-19' [4]. In some cases, SARS-CoV-2 infection (COVID-19) can lead to a severe chronic respiratory syndrome with pneumonia and/or a myriad of other complications.

After applying many COVID-19 control measures (shutting down Wuhan on the 23rd of January, blocking all travel to and from the city), the daily number of cases in China subsequently decreased. In contrast, large clusters were reported in an increasing number other countries.

In Vietnam, the first case was identified on January 23, 2020 [5]. Since the beginning of 2020, Vietnam has experienced four waves of the COVID-19 epidemic: the first wave (January 23—July 24, 2020) with 415 cases, including 106 domestic cases and 309 imported cases; the second wave (July 25, 2020—January 27, 2021) with 1,136 cases, including 554 domestic and 582 imported; the third wave (January 28—April 26, 2021) with 1,301 cases, including 901 domestic and 391 imported; and a fourth wave (starting on April 27, 2021) which is still ongoing [6]. By May 4, 2023, the national COVID-19 caseload had reached 10,433,530 with 43,057 fatalities [7]. From March 2020 to December 2021, the strategy of COVID-19 control was tracing, quarantine, and PCR analysis of every single person who had been in contact with test-positive individuals. All people entering Vietnam were obligated to quarantine and underwent COVID-19 diagnostics (RT-PCR).

SARS-CoV-2 is an enveloped, positive-sense single-stranded RNA virus. Its 30 kilobase genome features a rapid mutation rate, and nucleotide changes accumulate over time. Identifying and following mutations helps to minimize adverse public health impacts. Southern Vietnam is one of three geographic regions which includes 20 provinces and a principal megalopolis, Ho Chi Minh, with a population of around 9 million people. Here, we studied the epidemiology of COVID-19, including whole-genome sequencing of SARS-CoV-2 samples from Southern Vietnam, during the period from January 2020 to December 2021. Identification of the main circulating variants during the pandemic in Vietnam and their succession is valuable information needed by scientists, policy makers, and health professionals.

Materials and methods

Epidemiological data and study materials

All epidemiological data were obtained periodically from the Department of Preventive Medicine, Vietnamese Ministry of Health [7, 8]. During the period from January 2020 to December 2021, nasopharyngeal swabs of patients with suspected COVID-19 from twenty southern provinces of Vietnam (Bac lieu, An Giang, Bạc Liêu, Bac ninh, Binh Duong, Cà Mau, Da nang, Đồng Nai, Hau Giang, Ho Chi Minh City, Hung yen, Kiên Giang, Kom tum, Lam Dong, Long An, Quang Ninh, Tây Ninh, Tien giang, Tây Ninh, Vinh long), and nine imported cases, were collected and delivered to the Pasteur Institute in Ho Chi Minh City. Swabs were collected in 500 µL of viral transport medium, transported in tripled sealed containers, and stored under ultralow temperature conditions until further analysis. Viral RNA was extracted, and samples were tested for the presence of SARS-CoV-2 by real-time polymerase chain reaction (PCR) using a standard procedure at the Pasteur Institute in Ho Chi Minh City.

RNA purification, Real-time PCR

Total RNA from nasopharyngeal swabs samples were obtained by extraction and purification using the QIAamp Viral RNA Extraction Kit (QIAGEN, Germany) with the QIAcube Connect automatic station (QIAGEN, Germany) according to the manufacturers recommendations. Samples were eluted and stored at -70° C until further analysis. For SARS-CoV-2 detection and to assess viral load, swabs were thoroughly analyzed using LightMix® Modular Wuhan CoV E-gene reagents (Roche SAP) according to manufacturer’s recommendations. SARS-CoV-2-positive samples featuring Ct values of 25 or less were selected and studied further.

SARS-CoV-2 genome enrichment

To create NGS libraries, viral RNA was subjected to reverse transcription and subsequent PCR enrichment. Reverse transcription was performed using random hexamers with the Reverta L Kit (AmpliSens, Russia) following the manufacturer's protocol. Samples (cDNA) were stored at -20°C until amplification. To obtain a near complete genome sequences of SARS-CoV-2 (excluding the 5′ and 3′ ends), a total of 138 primer pairs covering the entire genome were applied according to a previously described multiplex PCR protocol [9]. Reactions were performed in a total volume of 25 μl containing 2 μl cDNA, 0.1 μM of each primer, and 12.5 μl 2 × BioMaster HS-Taq PCR mix (Biolabmix, Novosibirsk, Russia). PCR were performed with following parameters: 95 °C for 3 min; 40 cycles (93 °C for 10 s, 57 °C for 30 s, 72 °C for 30 s); a final extension (72 °C for 5 min). Reactions were performed in a C1000 Touch thermal cycler (Bio-Rad, USA). The products were analyzed by electrophoresis on a 2.0% agarose gel in the presence of ethidium bromide. Reaction products were purified using the AMPure XP purification kit (Beckman Coulter, UK) in 1:1 ratio according to the manufacturer's instructions and equimolary pooled. The concentration of the PCR-fragment mixture was measured using the dsDNA HS Assay Kit (Invitrogen, USA) with a Qubit 4.0 Fluorimeter (Invitrogen, USA) and used for library preparation.

Library preparation and sequencing

Library preparation was performed using the TruSeq DNA CD Indexes Kit (Illumina Inc., USA) according to the Illumina TruSeq Nano DNA Kit protocol. Sequencing was performed on a MiSeq instrument using MiSeq V3 chemistry, generating 2 × 200 bp reads.

Genome assembly

The quality of the Illumina reads was assessed using the FastQC program [10]. Sequencing, trimming and adapter removal were done using Trimmomatic-0.39 [11]. Trimmed reads were aligned to the SARS-CoV-2 Wuhan-1 reference genome (NC_045512.2) using the native Bowtie2 aligner [12]. Genomic variants were then called using the GATK pipeline [13]; variants were filtered by minimal quality (QUAL < 400). Consensus sequences for each sample were generated using bcftools consensus [14].

Availability of data

Sequences were uploaded to GISAID under the following IDs for year 2020: EPI_ISL_812922, EPI_ISL_760247, EPI_ISL_17454567, EPI_ISL_17436319, EPI_ISL_17454569, EPI_ISL_654872-EPI_ISL_654874, EPI_ISL_17433648-EPI_ISL_17433661, EPI_ISL_654866-EPI_ISL_654870, EPI_ISL_654886-EPI_ISL_654889, and EPI_ISL_654875-EPI_ISL_654884. For 2021, the IDs were: EPI_ISL_16034578, EPI_ISL_16034555-EPI_ISL_16034573, EPI_ISL_16034575-EPI_ISL_16060862, and EPI_ISL_17433647.

Variant annotation and phylogenetic tree reconstruction

Variant calling files (.vcf) were processed for the effect of SNP variation with the SnpEff tool ( Unique S gene genotypes were extracted from the set of annotated.gvcf files and used to make a variant map for each strain using the “Variant frequency plot” tool from Galaxy EU ( A global phylogenetic tree of SARS-CoV-2 variants was constructed using the tools implemented in Nextclade [15].


Epidemiology of SARS-CoV-2 in Southern Vietnam, 2020–2021

The first COVID-19 cases were confirmed in Southern Vietnam on January 23, 2020, when two Chinese individuals arrived in Ho Chi Minh City and tested positive for the virus [16]. Viral dissemination in the community was prevented through effective public health strategies, which included mandatory quarantine for travelers from other countries, isolation of people with confirmed COVID-19, and isolation of contacts (those in contact with a confirmed COVID-19 case). The Vietnamese authorities took measures to close air borders and cancel all international flights in March 2020. Since then, only repatriates, foreign specialists, and highly skilled workers have been allowed entry into the country under the strictest quarantine conditions. In the context of the measures taken, incidence in Southern Vietnam in 2020 was sporadic. The incidence rate did not exceed 0.2 per 100,000 population (Fig. 1).

Fig. 1
figure 1

COVID-19 incidence rate in Southern Vietnamese regions in 2020. Due to restrictions on movement and border crossing under strict quarantine, the incidence in 2020 in Southern Vietnam was sporadic. The incidence did not exceed 0.2 per 100,000 population

Considering the situation in twenty provinces of Southern Vietnam, a total of 365 cases of COVID-19 were registered in 2020. Five provinces did not detect a single case: An Giang, Binh Phuoc, Hau Giang, Lam Dong, and Soc Trang. Almost half of all cases occurred in Ho Chi Minh City, Southern Vietnam's largest city (Table 1).

Table 1 COVID-19 case numbers by province in 2020

In January 2021, there was a relaxation of the ban on air travel, but only under certain conditions: if there was evidence of having completed a full course of COVID-19 vaccination (no later than 14 days before the intended visit to the country) as well as a negative test performed not more than 72 h earlier. During the first five months of 2021, the incidence rate remained at a consistently low level. In May 2021, the incidence rate was already 0.8 per 100,000 population. Despite previously successful public health measures, there has been a dramatic increase in cases since June 2021. The incidence rate rose from 16.8 in June to 851.9 by December 2021 (Fig. 2). The largest increase in COVID-19 cases was observed in Ho Chi Minh City; the total number of reported cases reached 504,558 in 2021 (Table 2).

Fig. 2
figure 2

COVID-19 incidence rate in Southern Vietnamese regions in 2021. During the first five months of 2021, the incidence rate remained at a consistently low level. In May 2021, the incidence rate increased to 0.8 per 100,000 population. Since June 2021, there has been an increase in the number of cases from 16.8 in June to 851.9 by December 2021

Table 2 COVID-19 case numbers by province in 2021

The maximum increase in the number of COVID-19 cases in the second half of 2021 was observed in the three provinces closest to Ho Chi Minh City. Binh Duong province recorded 287,282 total cases in 2021. The southern districts of Binh Duong province are very urbanized and are within one of the districts of Ho Chi Minh City. The population is about 2.5 million people. Currently, Binh Duong is a zone of ecotourism, alongside a focus on historical and cultural relics. In the provinces of Dong Nai and Tay Ninh, 96,762 and 83,488 cases were detected, respectively. Due to their large populations, proximity to Ho Chi Minh City, and natural mobility of the population, these provinces experienced a high rate of increase in COVID-19 incidence (Table 2).

SARS-CoV-2 genetic diversity in Southern Vietnam, January 2020 to December 2021

Whole‐genome sequencing was performed for SARS‐CoV‐2 isolated from 9 imported cases and 117 domestic cases from 14 provinces in Southern Vietnam from January 2020 to December 2021. Several sequences from Northern Vietnam were analyzed as well. Genomic analysis results based on Nextclade SARS‐CoV‐2 Clade Assigner showed 11 clades including the Alpha, Beta, and Delta VOCs (Fig. 3, Table 3). The maximum number of mutations among non-structural genes was noted in ORF1ab. Among structural genes, the spike protein gene was foremost (Table 4). One of the first imported cases of SARS‐CoV‐2 (from China to Ho Chi Minh City) belonged to clade 19A.

Fig. 3
figure 3

SARS-CoV-2 phylogenetic tree reconstruction based on Nextclade tools. Branches are labeled by lineage according to Nextstrain nomenclature (legend top left). Sequences obtained in this study are colored red. The scale bar at the bottom indicates the number of nucleotide differences between each sample and the Wuhan-Hu-1/2019 reference sequence (GenBank: MN908947)

Table 3 Geographic distribution of successfully sequenced samples
Table 4 Number of SNPs in each ORF in each lineage

Mutations in the SARS-CoV-2 spike protein gene in samples from Vietnam

Of 126 genomes that underwent sequencing, 55 S gene sequence variants (SGSV) were identified based on SNP pattern. SGSV-1 was identical to the Wuhan-Hu-1/2019 (MN908947) reference strain without any SNPs or indels. The remaining 54 SGSVs had at least one SNP compared to the reference (Figs. 4 and 5). Within clade 19B, no mutations in the S gene were identified; all studied samples belonged to SGSV-1. In clade 19A, one strain had a non-synonymous SNP in the spike gene (S939F) and was attributed to SGSV-2. The other four strains of the clade had S gene sequences identical to that of the reference (Wuhan-Hu-1/2019). In clade 20A, apart from the defining D614G substitution, there was a unique non-synonymous A1078S substitution. Based on S gene SNP patterns, strains of this clade belonged to four different SGSVs (Fig. 4). Clade 20B samples had only one defining mutation (D614G) and three unique non-synonymous variations: A27S, L189F, and T478I.

Four different SGSVs were identified within the analyzed strains. Strains from clade 20C were identical by S gene sequence with only the single defining mutation (D614G), and they carried SGSV-9. Clade 20D samples had a single defining mutation (D614G) and a synonymous 23,731 C > T SNP. Clade 20E (EU1), apart from the defining A222V variant, had a synonymous 23,683 C > T (Fig. 4). Samples belonging to the Alpha VOC lineage (clade 20I), apart from defining mutations, gained a 24,109 C > T synonymous SNP. Beta VOC (20H) samples contained no defining D215G or E484K mutations, and no specific mutations.

Fig. 4
figure 4

Phylogenetic comparison and heatmap analysis of variations observed in clades 19A-B and 20A-H. The figure shows the phylogenetic diversity and analysis of variations observed in lineages 19AB and 20A-H based on S gene sequence. Mutations were found relative to the Wuhan-Hu-1/2019 (GenBank: MN908947) reference strain. SGSV – S gene sequence variants. Clades in which this pattern occurs are indicated in parentheses. The number of strains of the specified variant in the studied group is indicated by a colon

Fig. 5
figure 5

Phylogenetic comparison and heatmap analysis of variations observed in Delta clades. In 2021, most strains from domestic and imported cases were in the Delta VOC, major lineage 21I, and minor lineages 21A and 21J. The figure shows the phylogenetic diversity and analysis of variations based on S gene sequence. Mutations were found relative to the Wuhan-Hu-1/2019 (GenBank: MN908947) reference strain. SGSV – Sgene sequence variants. Clades in which this pattern occurs are indicated in parentheses. The number of strains of the specified variant in the studied group is indicated by a colon

In 2021, the majority of samples from domestic and imported cases belonged to the Delta VOC, major clade 21I, and minor clades (21A, 21 J). Altogether, forty SGSVs within the Delta clade were identified (Fig. 5). Apart from defining variations, Delta variant sequences contained some non-defining substitutions: L5F, V70F, T95I, and V1264L. According to Nextclade CoVariants, these mutations occur in Delta variants (21A, 21I, 21 J) with different frequencies: L5F (1.06—1.5%), V70F (0.97%), T95I (occurs in 21 J with frequency of 49.09%), and V1264L (1.5—15%). There are also unique non-synonymous substitutions: S12C, T20I, P26S, V36F, S98F, N148T, G181V, I472V, A623T, T1117K, P1162S, P1162L, and C1235F. Synonymous mutations include 21,658 C > T, 21,742 C > T, 21,979 A > G, 22,225 G > C, 22,456 A > G, 22,714 T > C, 23,596 T > C, 24,118 A > T, 24,745 C > T, 24,559 C > T, 24,943 T > C, and 24,898 A > G (Fig. 5).


COVID-19 continues to be a pressing public health problem. Different countries have had varying levels of success in combating the COVID-19 pandemic, with some countries among the most successful in the world at containing the pandemic and others in serious jeopardy. In the East Asia and Pacific regions, the most successful are considered Singapore, New Zealand, South Korea, China, and Vietnam [17]. Due to strict anti-COVID-19 measures and other actions of the authorities, the epidemic in Vietnam was somewhat under control, with a minimal number of registered cases spanning 1.5 years until July 2021. In early 2020, all passengers arriving on inbound flights to Vietnam underwent testing using real-time reverse transcription polymerase chain reaction (RT-PCR) (requirement expired 15 May 2022), alongside a mandatory 14-day quarantine in centralized facilities (requirement expired 1 January 2022) [18].

Genomic surveillance for SARS-CoV-2 was announced by the WHO as a necessary measure for pandemic control. In June 2020, the WHO Virus Evolution Working Group was established with a specific focus on SARS-CoV-2 variants, their phenotype, and their impact on countermeasures [19]. Genomic surveillance became the international standard for combating COVID-19 [20]. The scale and strategy of SARS-CoV-2 sequencing in different countries were implemented differently, which led to uneven data coverage across regions in the GISAID database and GenBank.

The Pasteur Institute in Ho Chi Minh City has shown vigilance and readiness in terms of monitoring SARS-CoV-2 lineages using NGS methods. According to the data obtained, it can be said that the first imported cases were from China and belonged to 19A (lineage B). Variants of lineage 19B (lineage A) circulated in Southern Vietnam from February to March 2020. In China, lineages A and B circulated together for a short time and were quickly totally replaced by lineage B [21]. Strains of lineage 19A (lineage B) were detected in Southern Vietnam until June 2020 and were further replaced by lineage 20 (A-E). Alpha and Beta (20I and 20H) variants from the VOCs list circulated in Southern Vietnam throughout 2020 without causing rises in the incidence or deterioration of the epidemiological situation. In neighboring ASEAN countries (Cambodia, Malaysia, the Philippines, Singapore, Indonesia, Thailand), lineage B variants widely replicated in 2020.

In addition to multiple introduced variants in 2020 [22], some countries have reported differences in the distribution and spread of local genetic variants. Indeed, local lineages have been reported in neighboring ASEAN countries [23,24,25,26].

The beginning of the pandemic was marked by two major lineages, denoted ‘A’ and ‘B’, which are probably the result of two different spillover events. The first zoonotic transmission likely involved lineage B viruses and occurred from late-November to early-December 2019 (no earlier than early November 2019). Introduction of lineage A likely occurred within weeks of the first event [27]. All published sequences from environmental samples taken at the Huanan Market also fall in lineage B. The earliest lineage A genomes had no direct epidemiological connection with the Huanan market [28].

At the beginning of 2021, the Alpha and Beta variants (20I, 20H) continued circulation in Southern Vietnam. Like global COVID-19 pandemic trends, they were completely replaced by Delta lineages, followed by a sudden increase in incidence marking the beginning of the 4th pandemic wave in Vietnam. The Delta variant was represented by three phylogenetic lineages (21A, 21I, 21J), with the 21I lineage dominating. Intragenomic variability within the Delta lineage was observed as a set of synonymous and non-synonymous substitutions in addition to the defining ones.

The first Delta variant sequences available in GISAID for ASEAN countries came from Malaysia [24] and Singapore in February 2021 [22]. From early 2021, Delta became the predominant variant in the Philippines, as well as in Singapore, Indonesia, Malaysia [25] and Thailand, causing a new wave of morbidity [29]. During 2021, the Delta variant, which included the original B.1.617.2 Pango lineage and descendants of the AY lineage, became widespread in Malaysia. Lineage B.1.617.2 was identified as an early dominant strain circulating throughout the country, but was later superseded by lines AY.59 and AY.79 [30]. In Southern Vietnam according to Pango classification, lineage AY.57 was dominating. The limitation of the study is that we observed and described only part of the genetic variation among SARS-CoV-2 lineages in Southern Vietnam in 2020–2021. The primary diversity, associated with multiple independent imports in 2020, was superseded by the Delta variant by mid-2021. Delta then became the main circulating variant in Vietnam, causing a new wave of incidence. Similar dynamics were observed in neighboring Asian countries. Watching subsequent waves of growth in case numbers associated with the easing of anti-COVID restrictions demonstrates the effectiveness and necessity of containment measures. Monitoring of circulating and emerging SARS-CoV-2 strains is essential to understanding the mechanisms of the pandemic and developing effective responses.


Pandemic dynamics in Southern Vietnam feature specific variations in incidence, and these largely reflect the success of the substantial, ongoing restrictive measures put in place during the early stages of the pandemic. Tracking of circulating lineages revealed major variants from the list of variants-of-concern, including intragenomic variability within circulating lineages. Further evaluation of epidemiological features and the circulation of SARS-CoV-2 variants is an essential part of COVID-19 surveillance in Vietnam.

Availability of data and materials

Genomic consensus sequences obtained in this study are deposited in GISAID ( The accession numbers for each sample submitted to and approved by CDC by IBX are publicly available in GISAID database. Raw data are available upon reasonable request from the corresponding author. The datasets used in the present study are available from the corresponding author on reasonable request.

Change history



Severe acute respiratory syndrome coronavirus 2


Coronavirus disease 2019


World Health Organization


Next generation sequencing


Polymerase chain reaction


Real time polymerase chain reaction


Ribonucleic acid


Variants of concern


Single nucleotide polymorphism


Association of South East Asian Nations


  1. World Health Organization, Geneva (Switzerland): World Health Organization; 2020. Situation Report 1 2020 (World Health Organization. Novel coronavirus (2019-nCoV), situation report-1. 21 January 2020. Available from: [Accessed 1 June 2023].

  2. Park SE. Epidemiology, virology, and clinical features of severe acute respiratory syndrome -coronavirus-2 (SARS-CoV-2; Coronavirus Disease-19). Clin Exp Pediatr. 2020;63(4):119–124. doi: Epub 2020. PMID: 32252141; PMCID: PMC7170784.

  3. World Health Organization . Geneva (Switzerland): World Health Organization; 2020. WHO Director-General’s remarks at the media briefing on 2019-nCoV on 11 February 2020. Available from: Accessed 1 June 2023.

  4. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol. 2020;5(4):536–544. doi: Epub 2020 Mar 2. PMID: 32123347; PMCID: PMC7095448.

  5. Phan LT, Nguyen TV, Luong QC, Nguyen TV, Nguyen HT, Le HQ, Nguyen TT, Cao TM, Pham QD. Importation and Human-to-Human Transmission of a Novel Coronavirus in Vietnam. N Engl J Med. 2020;382(9):872–874. Epub 2020. PMID: 31991079; PMCID: PMC7121428.

  6. Four COVID-19 waves in Vietnam in 2021, 2022 Accessed on 1 June 2023.

  7. Vietnam COVID-19 data.  1 June 2023.

  8. Ministry of health. Accessed 4 May 2023.

  9. Gladkikh A, Dedkov V, Sharova A, Klyuchnikova E, Sbarzaglia V, Arbuzova T, Forghani M, Ramsay E, Dolgova A, Shabalina A, Tsyganova N, Totolian A. Uninvited Guest: Arrival and Dissemination of Omicron Lineage SARS-CoV-2 in St. Petersburg, Russia. Microorganisms. 2022;10(8):1676.

  10. Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. 2010. Available online: [accessed on 1 June 2023].

  11. Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer forIllumina Sequence Data. Bioinformatics. 2014;30:2114–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Van der Auwera GA & O'Connor BD. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (1st Edition). O'Reilly Media. 2020.

  14. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. Twelve years of SAMtools and BCFtools. GigaScience 2021, 10, giab008.

  15. Aksamentov I, Roemer C, Hodcroft EB, Neher RA. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Software. 2021;6(67);3773,

  16. Coleman J. Vietnam reports first coronavirus cases. The Hill. Archived from the original on 18 February 2020. Accessed 1 June 2023.

  17. Post LA, Lin JS, Moss CB, Murphy RL, Ison MG, Achenbach CJ, Resnick D, Singh LN, White J, Boctor MJ, Welch SB, Oehmke JF. SARS-CoV-2 Wave Two Surveillance in East Asia and the Pacific: Longitudinal Trend Analysis. J Med Internet Res. 2021;23(2): e25454.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Nguyen TV, Luong QC, Pham QD. In the interest of public safety: rapid response to the COVID-19 epidemic in Vietnam. BMJ Glob Health. 2021;6(1): e004100.;PMCID:PMC7839307.

    Article  PubMed  Google Scholar 

  19. WHO. Tracking SARS-CoV-2 variants. Accessed 8 May 2023.

  20. Robishaw JD, Alter SM, Solano JJ, Shih RD, DeMets DL, Maki DG, et al. Genomic surveillance to combat COVID-19: challenges and opportunities. Lancet Microbe. 2021;2(9):e481–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Forster P, Forster L, Renfrew C, et al. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci U S A. 2020;117:9241–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Hoan NX, Pallerla SR, Huy PX, Krämer H, My TN, Tung TT, Hoan PQ, Toan NL, Song LH, Velavan TP. SARS-CoV-2 viral dynamics of the first 1000 sequences from Vietnam and neighbouring ASEAN countries. IJID Regions. 2022;2:175–9.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Su YCF, Ma JZJ, Ou TP, Pum L, Krang S, Raftery P, Kinzer MH, Bohl J, Ieng V, Kab V, Patel S, Sar B, Ying WF, Jayakumar J, Horm VS, Boukli N, Yann S, Troupin C, Heang V, Garcia-Rivera JA, Sengdoeurn Y, Heng S, Lay S, Chea S, Darapheak C, Savuth C, Khalakdina A, Ly S, Baril L, Manning JE, Simone-Loriere E, Duong V, Dussart P, Sovann L, Smith GJD, Karlsson EA. Genomic epidemiology of SARS-CoV-2 in Cambodia, January 2020 to February 2021. Virus Evol. 2022;9(1):veac121. PMID: 36654682; PMCID: PMC9838690.

  24. Tan KK, Tan JY, Wong JE, Teoh BT, Tiong V, Abd-Jamil J, Nor'e SS, Khor CS, Johari J, Yaacob CN, Zulkifli MM, CheMatSeri A, Mahfodz NH, Azizan NS, AbuBakar S. Emergence of B.1.524(G) SARS-CoV-2 in Malaysia during the third COVID-19 epidemic wave. Sci Rep. 2021 Nov 11;11(1):22105. doi: PMID: 34764315; PMCID: PMC8586159.

  25. Joonlasak K, Batty EM, Kochakarn T, Panthan B, Kümpornsin K, Jiaranai P, Wangwiwatsin A, Huang A, Kotanan N, Jaru-Ampornpan P, Manasatienkij W, Watthanachockchai T, Rakmanee K, Jones AR, Fernandez S, Sensorn I, Sungkanuparph S, Pasomsub E, Klungthong C, Chookajorn T, Chantratita W. Genomic surveillance of SARS-CoV-2 in Thailand reveals mixed imported populations, a local lineage expansion and a virus with truncated ORF7a. Virus Research. 2021;292.

  26. Chong YM, Sam IC, Chong J, Kahar Bador M, Ponnampalavanar S, Syed Omar SF, Kamarulzaman A, Munusamy V, Wong CK, Jamaluddin FH, Chan YF. SARS-CoV-2 lineage B.6 was the major contributor to early pandemic transmission in Malaysia. PLoS Negl Trop Dis. 2020;14(11):e0008744. PMID: 33253226; PMCID: PMC7728384.

  27. Pekar JE, et al. The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science. 2022;377:960–6.

    Article  CAS  PubMed  Google Scholar 

  28. Worobey M. Dissecting the early COVID-19 cases in Wuhan. Science. 2021;374(6572):1202–4. (Epub 2021 Nov 18 PMID: 34793199).

    Article  CAS  PubMed  Google Scholar 

  29. Suphanchaimat R, Teekasap P, Nittayasoot N, Phaiyarom M, Cetthakrikul N. Forecasted Trends of the New COVID-19 Epidemic Due to the Omicron Variant in Thailand, 2022. Vaccines (Basel). 2022;10(7):1024.;PMCID:PMC9320113.

    Article  CAS  PubMed  Google Scholar 

  30. Azami NAM, Perera D, Thayan R, AbuBakar S, Sam IC, Salleh MZ, Isa MNM, Ab Mutalib NS, Aik WK, Suppiah J, Tan KK, Chan YF, Teh LK, Azzam G, Rasheed ZBM, Chan JCJ, Kamel KA, Tan JY, Khalilur Rahman O, Lim WF, Johari NA, Ishak M, Yunos RIM, Anasir MI, Wong JE, Fu JYL, Noorizhab MNF, Sapian IS, Mokhtar MFM, Md Shahri NAA, Ghafar K, Yusuf SNHM, Noor YM, Jamal R; Malaysia COVID-19 Genomics Surveillance Consortium. SARS-CoV-2 genomic surveillance in Malaysia: displacement of B.1.617.2 with AY lineages as the dominant Delta variants and the introduction of Omicron during the fourth epidemic wave. Int J Infect Dis. 2022;125:216–226. Epub 2022. PMID: 36336246; PMCID: PMC9632236.

Download references


The authors would like to thank: Pham Thi Thu Hang, Vu Pham Hong Nhung, Nguyen Viet Thinh and Luong Chan Quang for their technical and epidemiological data assistance; and the US CDC for reagent support.


The study was funded within the framework of Russian-Vietnamese cooperation in accordance with the Decree of the Government of the Russian Federation of July 13, 2019 (No. 1536-r).

Author information

Authors and Affiliations



Conceptualization: ASG, TMC, VGD; investigation and methodology: ASG, TMC, EOK, MHD, AAS, VDM, MRP, TVA, VAS, NAT; formal analysis: ASG, TMC, EOK, MHD, AAS, VDM; resources: TMC, VGD; supervision: TMC, VGD; writing – original draft, ASG, TMC, EOK; writing – review & editing, ER. All authors have read and agreed to the final version of the manuscript.

Corresponding author

Correspondence to Ekaterina O. Klyuchnikova.

Ethics declarations

Ethics approval and consent to participate

The study was evaluated and approved by the local Ethics Committee of the St. Petersburg Pasteur Institute (St. Petersburg, Russia, № 063–03) and the Ethics Committee of the Pasteur Institute in Ho Chi Minh City (17/CN-HDDD). Research was performed according to the principles of Declaration of Helsinki for medical research. Informed consent was obtained from all subjects and/or their legal guardian(s).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been updated to correct figure 4 & 5.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gladkikh, A.S., Cao, T.M., Klyuchnikova, E.O. et al. Near complete genome sequences from Southern Vietnam revealed local features of genetic diversity and intergenerational changes in SARS- CoV-2 variants in 2020–2021. BMC Infect Dis 23, 806 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: