Molecular characterization of hepatitis B virus in Vietnam

Background Hepatitis B virus (HBV) infection is a major public health problem globally. HBV genotypes and subgenotypes influence disease transmission, progression, and treatment outcome. A study was conducted among treatment naive chronic HBV patients in southern Vietnam to determine the genotypes and subgenotypes of HBV. Methods A prospective, exploratory study was conducted among treatment naïve chronic HBV patients attending at the Hospital for Tropical Diseases, in Ho Chi Minh City, Vietnam during 2012, 2014 and 2016. HBV DNA positive samples (systematically selected 2% of all treatment naïve chronic patients during 2012 and 2014, and 8% of all treatment naïve chronic patients during 2016) were subjected to whole genome sequencing (WGS) either by Sanger or Illumina sequencing. WGS was used to define genotype, sub-genotype, recombination, and the prevalence of drug resistance and virulence-associated mutations. Results One hundred thirty five treatment naïve chronic HBV patients including 18 from 2012, 24 from 2014, and 93 from 2016 were enrolled. Of 135 sequenced viruses, 72.6% and 27.4% were genotypes B and C respectively. Among genotype B isolates, 87.8% and 12.2% were subgenotypes B4 and B2 respectively. A G1896A mutation in the precore gene was present in 30.6% of genotype B isolates. The genotype C isolates were all subgenotype C1 and 78.4% (29/37) of them had at least one basal core promoter (BCP) mutation. A1762T and G1764 T mutations and a double mutation (A1762T and G1764 T) in the BCP region were significantly more frequent in genotype C1 isolates (p < 0.001). Conclusion HBV genotype B including subgenotype B4 is predominant in southern Vietnam. However, one fourth of the chronic HBV infections were caused by subgenotype C1. Electronic supplementary material The online version of this article (10.1186/s12879-017-2697-x) contains supplementary material, which is available to authorized users.


Background
Worldwide, an estimated 2 billion people have been infected with the hepatitis B virus (HBV) and of these 250 million suffer from chronic HBV infection [1].HBV infection either leads to spontaneous recovery or to chronic HBV, which causes chronic liver disease, including liver cirrhosis (LC) and hepatocellular carcinoma (HCC) [2].
HBV is a small circular DNA virus (~3.2 kb in length) that contains 4 genes with partially overlapping open reading frames (ORFs). These overlapping ORFs encode the polymerase protein, the surface antigen, the core antigen and the X protein. HBV is highly heterogeneous and is composed of genomes that are closely related but not identical; hence, it is considered as a viral quasispecies within an infected individual [3]. Viral replication is rapid with up to 10 11 virions generated each day in infected individual.Due to the high reverse transcription error rate of the polymerase (1 error/10 7 bases) during active infection, 10 7 base-pairing errors can be generated over the 3200-bp genome per day [4].While most of these new sequences are within nonviable viruses, they provide a starting point for the emergence of mutants under selective pressure.
HBV can be categorized into 10 different genotypes (A-J; segregated by <7.5% genomic sequence diversity) and 40 different subgenotypes (separated by >4% genomic sequence diversity) [5]. HBV subgenotyping has caused controversy in the past due to misclassifications and incorrect interpretations from different genotyping methods (whole genome sequence versus S gene sequence). Criteria for assigning a new subgenotype have been proposed recently [5]. They include: i) analysis of full-length genome, ii) adherence to intra-genotypic nucleotide divergence (>4.5% and <7.5%), iii) bootstrap values greater than 75%, iv) exclusion of recombinant strains from analysis, v) identification of specific nucleotide and amino acid motifs, vi) a minimum of three purported novel strains, and vii) all available subgenotype strains belonging to the same genotype should be subjected to evolutionary and phylogeny analysis.
HBV genotypes and sub-genotypes differ considerably with respect to geographical distribution, transmission routes, disease progression, responses to antiviral therapy, and clinical outcome, e.g. LC and HCC [6]. The clinical course of infection depends on the host's age at infection, genetic factors, and the genomic variability of the virus, including genotypes, subgenotypes and virulence associated mutations [preS1, preS2, S gene mutations, basal core promoter (BCP), precore (PC) and core mutations [7][8][9]. Earlier studies have shown that the mutations in the preS1 and preS2 genes are associated with progression to HCC [9]. The AA positions between 99 to 169 in the S gene are called the major hydrophobic region (MHR) and the "a" determinant region (aa 124 to 147) is located within the MHR. Mutations in the "a" determinant region cause conformational changes in the S protein that can affect the antigenicity of HBsAg and can generate immune escape mutants [10]. Mutations in BCP, PC and core have shown to be associated with HCC [11].
Vietnam is classified as a high burden country regarding hepatitis, and the prevalence of chronic HBV infection is 8-20% and 31-54% among the general and the urban high risk populations respectively [12]. Projection and modeling studies have predicted approximately 8 million chronic HBV cases and 58,600 HBV related LC cases in Vietnam by 2025, and that the estimated annual HBV related mortality will be 40,000/year by 2025 [13]. Limited research has been conducted on molecular characterization and determination of virulence associated properties of HBV in Vietnam. Representative data on genotype and subgenotype analysis using whole genome sequence (WGS) [14][15][16], as well as information on the prevalence of virulence associated mutations in different genes, recombination and drug resistant mutations in treatment naïve patients are limited.
We conducted a prospective exploratory study using systematically and randomly collected HBV isolates from treatment naïve patients attending a tertiary care hospital in southern Vietnam from 2012, 2014 and 2016 respectively. We used WGS to investigate the prevalences of i) genotype and subgenotype, ii) recombinants, iii) primary, secondary and potential nucleos(t)ide analogue (NA) resistance (NAr) mutations, and iv) mutations in the preS1, preS2, S gene, BCP/PC and core gene.

Methods
The study was conducted at the Hospital for Tropical diseases (HTD), Ho Chi Minh City Vietnam from June to December 2012, January to June 2014 and January to December 2016. HTD is a 650 bed tertiary care hospital for infectious diseases and a designated referral center for hepatitis patients for the southern provinces of Vietnam. All treatment naïve chronic HBV patients attending at the hepatitis outpatient department for viral load assays were eligible for the study. Systematically selected residual diagnostic samples from 2% of the patients (samples from every 50th patient) from 2012 and 2014 and 8% of the patients from 2016 were included in this study. Serum samples from selected patients were stored at minus 86°C until further analysis. Patient address (province, district, city, and wards), clinical chemistry and viral load data were collected from the hospital database. The geolocation of the patients were mapped with QGIS software version 2.18. The study was approved by the Hospital for Tropical Diseases' ethical review committee (Approval No: SC/ND/12/14). Viral DNA was extracted from 200 μL of plasma using QIAamp viral DNA kit (QIAgen GmbH, Hilden, Germany) and eluted in 50 μL TBE. The HBV genome was amplified in 4 overlapping fragments (800 bp to 1. The PCR was performed for 35 cycles at 94°C for 1 min, 58°C for 1 min, and 72°C for 1 min in a thermal cycler (ABI 9800) [18]. The PCR products were visualized by 1% agarose electrophoresis and stained with Nancy 520 DNA gel stain. The PCR product was purified using QIAamp PCR product purification kit (QIAgen GmbH, Hilden, Germany). The eluted DNA was quantified by a fluorescence-based dsDNA quantification method using the Quant-iT dsDNA Assay Kit in a Qubit fluorometer (Invitrogen) and was sequenced either by ABI 3100 system after cycle sequencing reaction or by Illumina Myseq system. For the ABI 3100 system, DNA sequencing was done from both ends and consensus sequence was used to construct the whole genome using overlapping fragments. For Illumina sequencing, the amplified fragments were pooled with an equal quantity of each individual PCR amplicon. One nanogram of pooled DNA from individual samples was subjected to library preparation using the Nextera XT DNA sample preparation kit (Illumina, San Diego, CA, USA), in which each sample was assigned to a unique barcode sequence using the Nex-tera XT Index Kit (Illumina). Sequencing of the prepared library was carried out using the Miseq reagent kit v2 (300 cycles, Illumina) in an Illumina Miseq platform.
The Illumina fastq sequence files were assembled using Genious 8.0.5, software package (Biomatters Ltd, AK, New Zeland) utilizing a reference-based mapping tool after primer sequence clipping (i.e. the consensus sequence was obtained by mapping individual reads of each sample to a reference sequence). Finally, screening of minor (sub-consensus) variants was performed using the SNP detection tool available in Geneious. A minimum variant frequency of 5% and 500-fold coverage were chosen as cut-off values.
Seventy well characterized HBV WGS representing all genotypes and subgenotypes were downloaded from Gene Bank and the HBV WGS from the current study were subjected to phylogenetic analysis. All complete genome sequences were aligned with MUSCLE from the Genious package. The sequence alignments were then subjected to the Jmodel test to identify the best model for phylogenetic analysis [19]. The suggested nucleotide substitution model (GTR + G + I) was subsequently used in phylogenetic analysis using RAxML v7.2.8 (available in the Genious package). To confirm the reliability of phylogenetic tree analysis, bootstrap resampling and reconstruction were carried out 100 times.
All sequences were analyzed for possible recombination by RDP4 v 4.85 software [20]. Any recombination events detected by at least 5 of the 7 programs (RDP, Geneconv, Bootscan, Maxchi, Chimaera, Siscan and Topol) were considered as true recombination. RDP4 v4.85 standard default settings were used, except for Bootscan and Siscan where window sizes of 300 bp, step size 30 were used. The prevalence of recombination, recombination breakpoints (start and end point), length of the recombinant fragments and the locations of the recombination were determined.
All data (socio demographic, biochemical and virological) were recorded and analyzed with Statistical Package for the Social Sciences (IBM SPSS version 23, NY, USA). Fisher's exact test was used for the comparison of nominal scale variables and Mann -Whitney U test for ordinal scale variable. A P value </0.05 was considered to indicate statistically significant difference. The geolocations of the 135 patients enrolled in the study were mapped; 87 districts were represented from the 26 southern provinces of Vietnam. Approximately 58% of the patients were from six provinces including Ho Chi Minh City (28.1%; 38/135), Dong Nai (6.66%; 9/135), Long Anh (6.66%; 9/135), Binh Duong (5.93%; 8/135), Dong Thap (5.18%; 7/135) and Tien Giang (5.18%; 7/135). The rest of the patients were from 46 districts of the remaining 20 southern provinces of Vietnam (Additional file 1). The gender, demographic information, liver enzyme level, viral load and genotype and subgenotype distribution of 135 patients are presented in Table 1. Approximately half of the patients were male and the mean age of the patients was 32 years. 11.1% (15/135) had a blood ALT concentration of ≥5UNL (reference range is <37 IU/l and <40 IU/l, for females and males, respectively).

From
The sequence length was 3215 bp for all 135 isolates. Phylogenetic analysis of whole genome sequence showed that the Vietnamese HBV sequences clustered with genotypes B and C reference sequences (Fig. 1). Most HBV isolates belong to genotype B (72.6%; 98/135), subgenotype B2 (12.2%; 12/98) and B4 (87.7%; 86/98). 27.4% (37/135) of the isolates were genotype C and all genotype C isolates were belonged to subgenotype C1. The subgenotype C1 isolates formed a closely related cluster with a high bootstrapping value (99.99). There were no significant differences in genotypes among i) the isolates from 2012, 2014 and 2016, ii) isolates sequenced by Sanger or Illumina sequencing methods, iii) liver enzyme (AST and ALT) level, iv) viral load and v) geolocation of the patients. All sequences have been deposited in Gen Bank under accession numbers MF621878 and MF674382 -MF674515.

Discussion
Determination of genotype and subgenotype of HBV is important as they are associated with clinical presentation, transmission, response to therapy and treatment outcome [2]. Considering the large pool of treatment naive chronic HBV patients and limited availability genotyping facilities in Vietnam, genotype and subgenotype data representative of the population are important for clinical decision making, including empirical therapy, disease modeling, and health resource allocation for the management of chronic HBV patients [13]. However, one of the key criteria for assignment of an isolate to a particular subgenotype should be based on analyzing the WGS. In our study we have analyzed the WGS of isolates for genotype and subgenotype determination. We used a 7.5% diversity across WGS criteria to define an isolate as a particular genotype and 4.5% to 7.5% intragenotypic nucleotide divergence for assigning a sequence to a subgenotype [5]. We have collected 2% and 8% of the samples of treatment naïve chronic HBV patients attending to a tertiary care and hepatitis referral hospital for southern Vietnam during 2012, 2014, and 2016 in order ensure the representativeness of our data. Besides this, the geolocation analysis indicates that patients enrolled in our study were from 26 provinces including 87 districts of southern Vietnam.
Our data indicate that in southern Vietnam HBV genotype B is dominant, followed by Genotype C. This is in agreement with earlier data published from Vietnam and southeast Asia using a partial S gene sequencing approach [15]. Although HBV genotype and subgenotype have distinct geographical distributions, we could not identify such distributions in our study population. This might be due to the fact that both genotype B and C is prevalent in southern Vietnam and the subgenotype diversity is limited (i.e. only three subgenotypes are circulating in the southern Vietnam).
Recombination analysis revealed that two thirds of the isolates, including 90% of the genotype B isolates, are a genotype B/C recombinant. This is not surprising as it has been reported that HBV genotype B/Ba (B2-B5) isolates from Vietnam, China, Hong Kong, Indonesia, and Thailand have undergone recombination with HBV/C in the core promoter/precore/core genomic region [22]. It is also interesting to note that the HBV genotype B1 isolates (also called Bj) from Japan are non recombinant