Serum samples from randomly selected 140 patients with chronic HBV monoinfection and 59 patients with chronic HBV/HDV coinfection were collected in 2018–2019 in Yakutia. Written informed consent was obtained from all participants. The whole study was conducted in accordance with the principles expressed in the World Medical Association Declaration of Helsinki regarding ethical medical research involving human subjects. The study design was approved by the Ethics Committee of the Mechnikov Research Institute for Vaccines and Sera in Moscow, Russia (Approval #1 dated February 28, 2018).
All serum samples were transported to lab at the Mechnikov Research Institute for Vaccines and Sera using a cold chain and stored in aliquots at − 70 °C before testing.
HBV and HDV amplification, sequencing, and genotyping
Viral nucleic acids were extracted from the patients’ sera using the QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany). HDV complete genome sequences were amplified with primers, resulting in two overlapping fragments described by Celik et al. . using the Transcriptor First Strand cDNA Synthesis Kit and Fast Start High Fidelity PCR System (Roche Applied Science, Mannheim, Germany), according to the manufacturers’ protocols. All amplicons were excised from an agarose gel and subjected to nucleic acid isolation using a QIAquick Gel Extraction Kit (QIAGEN, Hilden, Germany) according to the manufacturer’s protocol, and sequenced on the Ion Torrent S5 platform. Libraries were prepared using the Ion 510 & Ion 520 & Ion 530 Kit-Chef (Thermo Fisher Scientific, Waltham, USA) for up to 200 base-read libraries. All samples were read with a coverage of at least 500×, and quality not lower than Q30. The read assembly was performed by BWA-MEM. HDV sequences from this study were deposited in GenBank under the following accession numbers: OK142820–OK142858, OK142920–OK142929, OK142938, and OK142939. Additionally, HDV R0 fragment was amplified using protocol describe elsewhere  from 22 patient samples in which the complete genome amplification failed. These 22 R0 sequences were sequenced using the 3500 Genetic Analyzer (ABI, Foster City, USA) and BigDye Terminator v 3.1 Cycle Sequencing Kit according to the manufacturer’s protocol and deposited in GenBank under the following accession numbers: OL875352-OL875373. The HDV genotype was determined based on phylogenetic analysis with the reference HDV sequences recommended in the ICTV 2018 classification  using the maximum likelihood (ML) method in the MEGA 7.0.18 software .
HBV amplification was performed in two stages. First, HBV DNA testing was performed in an in-house PCR assay with nested primers for overlapping regions of S and P genes adapted from Basuni and Carman  with a detection limit of about 50 IU/ml based on the results of serial dilution testing for PCR standards with known viral load. The resulting 713 bp amplicons were extracted from the gel as described above and subjected to direct Sanger sequencing on the 3130 Genetic Analyzer (ABI, Foster City, USA) automatic sequencer using the BigDye Terminator v3.1 Cycle Sequencing Kit according to the manufacturer’s protocol.
Next, we amplified complete HBV genomic sequences with primers using a protocol described elsewhere  for 13 HBV-monoinfected patients whose serum samples had an available remaining volume > 500 ml for DNA extraction using MagNA Pure Compact Nucleic Acid Isolation Kit I Large Volume (Roche Applied Science, Mannheim, Germany). Amplification was performed using the Expand™ Long Template PCR System (Roche Applied Science, Mannheim, Germany). Libraries were prepared from amplicons using Qiagen FX DNA Library kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol and were sequenced on the Illumina Miseq 250PE platform. Complete HBV genomic sequences have been deposited in GenBank under the following accession numbers: OK143470, OK143474, OK143477, OK143478, OK143480, OK143482, OK143485, OK143489, OK143490, OK143494-OK143496, and OK143498.
The HBV genotype was determined based on phylogenetic analysis of a 650 nucleotide fragment of the S-gene (nucleotide positions 157–806 by reference sequence NC_003977.2) with the references recommended by the ICTV  and suggested for HBV sub-genotype designation  using the ML method in the MEGA 7.0.18 software .
As the majority of patients with HBV/HDV coinfection did not have detectable HBV viremia, the HBV genotype was predicted for these 59 patients based on the results of HBsAg serotyping in the ELISA kit (Vector-Best, Novosibirsk, Russia) according to the manufacturer’s protocol. Briefly, HBsAg testing was performed in parallel with three different monoclonal antibody-based conjugates allowing differentiation between the following HBV serotypes/genotypes: ayw2 or ayw3/D, adw2/A, adrq+/C.
Time-scaled phylogenetic analysis
Complete HDV genomic sequences from this study were supplemented with 327 sequences from GenBank for which the year and country of isolation were known, and which had a divergence of at least 1%. HBV analysis incorporated complete genomic sequences from this study and an additional 321 sequences from GenBank, including ancient sequences obtained from mummies. Several filters were applied for both datasets (skipredundant with a cutoff of 1%) to remove overabundant sequences from each location. In resulting datasets, the ratio between Russian isolates and sequences from databases was at least 1:1.5, and for sequences from Yakutia this ratio was at least 1:5.
The alignment for both HBV and HDV datasets was performed using the ClustalW algorithm in the MEGA 7 software . The most appropriate substitution model was estimated using Jmodeltest-2.1.10. The most suitable models were determined to be SYM+G for the HDV dataset and GTR+G for HBV, which were chosen because they had the lowest Bayesian information criterion (BIC) value. The presence of a positive correlation between genetic divergence and sampling time for HBV and HDV was checked using TempEst v.1.5 software. This analysis also made it possible to calculate the initial clock rates for HBV and HDV (Additional file 1: Fig. S1). The phylogenetic trees were built using the PhyML-3.1 software.
Bayesian analysis was performed using the BEAST v1.10.4 software package. After trial runs to determine the optimal parameters, the SRD06 nucleotide substitution model with a strict clock was selected as the optimal model for HDV. The size was constant as the coalescent tree prior was used. All sequences were checked to ensure reading frame accuracy; no stop codons were identified. The base frequencies were estimated. The initial clock rate 2.74*10−3 subs./site/year was used to estimate a clock rate. The Markov Chain Monte Carlo (MCMC) method was run for 50 million generations and sampled every 5000 steps in two repetitions.
Similar work was carried out for HBV, but the parameters of the final runs were as follows: Yang96 nucleotide substitution model, lognormal relaxed clock, Bayesian Skyline Plot (BSP) 5 group. The initial clock rate 1.18*10−5 subs./site/year was used to estimate a clock rate . Likewise, the MCMC method was run for 50 million generations and sampled every 5000 steps in two repetitions.
For both viruses the two parallel runs were combined using LogCombiner v1.10. 4. Tracer v1.6 was used to check convergence. The effective sample size was > 500 in both cases. Trees were annotated with TreeAnnotator v.1.10.4 using a burn-in of 1000 trees, and visualized with FigTree v.1.4.3.
Skyline analysis for reconstruction of HBV and HDV population dynamics
The Skyline methods were used to extract data on the population dynamics of genotypes HBV-A and HBV-D, and HDV-1 and HDV-2 in Yakutia from phylogenetic trees. For this purpose, we used trees that were built only on the sequences that were collected in Yakutia. The vast majority of HBV and HDV sequences from Yakutia that were available from previous studies were restricted to genome fragments, representing the R0 region for HDV and the S-gene for HBV. Consequently, we limited the Skyline analysis to sequences, corresponding to a 379 nucleotide fragment of HDV R0 region (nucleotide positions 899–1279 by reference sequence M21012) and a 650 nucleotide fragment of HBV S-gene (nucleotide positions 157–806 by reference sequence NC_003977.2), respectively. For this purpose, 15 HDV sequences and 73 HBV sequences isolated in Yakutia between 1997 and 2016 by various researchers were added to sequences obtained in this study (Additional file 1: Table S1). The resulting datasets comprised 29 sequences for HBV-A, 57 for HBV-D, 44 for HDV-1, and 44 for HDV-2.
Two different methods, Bayesian SkyGrid reconstruction and Birth–Death Skyline analysis, were applied to four datasets, representing HBV-A, HBV-D, HDV-1, and HDV-2, respectively. For all datasets, the main parameters for both models were taken from calculations based on primary trees built with all reference sequences. The Bayesian SkyGrid reconstruction was performed using BEAST v1.10.4 software. In all cases, the Bayesian SkyGrid coalescent model was used with the Tree Prior parameter defined as equal to 50 and the final time point of 100 years before the most recent sampling. The MCMC method was run for 100 million generations and sampled every 1000 steps. Tracer v1.7.2 was used for visualization.
A Birth–Death Skyline analysis was performed to calculate the reproduction number (Re) separately for HBV-A and HBV-D genotypes, and HDV-1 and HDV-2 genotypes, respectively, using the BEAST2 software. The length of the MCMC was set to 100 million generations. R-package bdskytools was used to visualize the results and construct plots.
Data analysis was performed using graphpad.com. Statistical analysis included assessment of the significance of differences in values between groups using the Fisher exact test (significance threshold p < 0.05). Differences for quantitative values were assessed using Student’s t-test (significance threshold p < 0.05).