Bridging epidemiology with population genetics in a low incidence MSM-driven HIV-1 subtype B epidemic in Central Europe

Background The HIV-1 epidemic in Slovenia, a small Central European country, has some characteristics that make it an ideal model to study HIV-1 transmission. The epidemic is predominantly affecting men who have sex with men infected with subtype B (89% of all patients), has a low prevalence (less than 1/1000) and is growing slowly. The aim of the present study was to analyze in detail the evolutionary history and the determinants of transmission. Methods A total of 223 partial pol gene sequences from therapy naïve individuals were included, representing 52% of all patients newly diagnosed in 13 years (2000–2012) and analyzed together with genetically similar worldwide sequences, selected in a BLAST search. Results Combined analysis (maximum likelihood and Bayesian) of HIV-1 transmission chains revealed 8 major clusters (n ≥ 10 patients), 1 group of 4 patients, 2 trios and 12 transmission pairs, thus leaving only 43 (19.3%) Slovenian patients infected with subtype B without a local epidemiological link, indicating a predominance of local transmission of HIV-1 infection. Bayesian analysis performed on a full set of sequences estimated several introductions of HIV-1 into Slovenia, with the most recent common ancestor (tMRCA) of the earliest Slovenian cluster dated to the late 1980s, although tMRCAs obtained from separate independent analysis of each cluster showed considerably more recent estimates. These findings indicate inconsistencies in molecular clock estimation, which we further explored. We hypothesize that these inconsistent dating estimates across the tree could be caused by an evolutionary rate acceleration of HIV-1 after entering the Slovenia epidemic that is not taken into account by the molecular clock model. It could be caused by a lower transmission rate in this setting, as demonstrated by the low epidemic growth rate estimated by Bayesian skyline plot analysis. Conclusions HIV-1 subtype B was introduced into Slovenia at several time points from the late 80s onward. The majority of patients had a local transmission link, indicating a closed HIV community. The observed slower epidemic rate suggests that individuals with a long-lasting infection are the driving force of the epidemic in this region.


Background
Viral sequence data, particularly in the case of HIV-1, which has extremely high mutation rates, can help characterize the evolutionary history and population genetics of an epidemic (reviewed in Pybus and Rambaut 2009). By linking HIV evolution to the dynamics and transmission of the infection, phylogenetic tree reconstruction and the coalescent can help to indicate the nature of transmission events that happened in the past. By combining it with molecular clock models, it can help to time the origin of epidemics and reconstruct changes in pathogen population size over time [1].
Extensive studies have been undertaken on HIV-1 subtype B sequences with the aim of understanding local epidemic patterns. Lewis et al. (2008) analyzed subtype B sequences obtained from patients (predominantly men who have sex with men (MSM)) attending a clinic in London between 1997 and 2003. The dated phylogenies indicated that the majority of transmissions within clusters occurred between 1995 and 2000. Furthermore, 25% of transmissions happened within 6 months after infection, thus showing an episodic pattern of the HIV epidemic among MSM in London [2]. A similar study was conducted in 2011 on a vast set of 14,560 sequences from London, in which 52% of individuals were determined to have a transmission link to at least one other individual. A transmission interval of less than 6 months was estimated among 16% of individuals, whereas the median internode interval of the entire set was 22 months. This confirmed an important role of recent infection in transmitting the infection forward in this predominantly MSM group [3]. A study examining subtype B sequences of the MSM epidemic in Italy, Slovenia's neighbor, revealed several introductions of the virus into the country. Half of the characterized clusters originated in the early 1990s, about 20 years later than in other Western European countries. Demographic analysis showed a rapid exponential increase in the effective population, starting in the early 1980s and reaching a plateau 10 years later [4]. On the other hand, a study focusing on the HIV subtype B epidemic in the Balkan area observed multiple introductions of HIV and, through dated phylogeny, suggested that the epidemiological network did not exist before the early 1970s. One of the first clades proposed to emerge at this time was of Slovenian origin. Slovenia was also found to be the probable entry point of HIV subtype B into the Balkans, since it was frequently determined as the most recent common ancestor (MRCA) location of Balkan clades [5].
Slovenia is a small Central European country with a modest burden of HIV disease, with less than 1 per 1000 inhabitants infected. The first AIDS case in the country was registered in June 1986 and, by the end of 2012, 592 HIV cumulative cases had been reported [6,7]. The HIV epidemic mostly affects MSM and subtype B is the most prevalent subtype, with 84% of patients diagnosed in the period 1996-2005 and 89% in 2005-2010 [8,9]. The characteristics of the subtype B epidemic in Slovenia were initially investigated in a previous study conducted in 2006, examining a 10-year period from 1996 to 2005. Phylogenetic analysis determined 14 potentially significant clusters, comprising 5, 4, or 3 patients and 11 transmission pairs, mostly among MSM (79%). Factors found significantly associated with clustering were: infection during or after 2003, diagnosis of primary HIV infection, higher CD4 cell count, and the infection being acquired in Slovenia [10].
The aim of this study was to determine the key properties of the subtype B epidemic in Slovenia, the traits associated with the clustering of individuals and to characterize the start of the HIV local epidemic and its course over time. Given the high proportion of subtype B in Slovenia (88.9%), only sequences of this subtype were included in our study, which still amounted to more than half of all patients diagnosed in the whole country.

Methods
For the purpose of this study, data and sequences were gathered from 3 previous studies conducted in Slovenia examining the prevalence of transmitted drug resistance among therapy naive HIV-1 positive patients diagnosed  [9,11,12]. The following epidemiological and laboratory data were available for statistical analysis: gender, age at time of diagnosis, year of diagnosis, country of origin, acute retroviral syndrome (ARS), Centers for Disease Control and Prevention (CDC) class, AIDS defining illnesses, other sexually transmitted diseases, most probable route of HIV infection, relationship with the source of HIV infection (sex with anonymous person or stable relationship), country where the infection most probably occurred, viral load and CD4 cell count at the time of diagnosis and presence of surveillance for drug resistance mutations (SDRMs).
For the purpose of the incidence estimate, patients were estimated as recently infected when diagnosed and sampled within 155 days after infection, or as having a long-standing infection (LSI) when diagnosed after 155 days following infection. In brief, patients with a baseline CD4+ cell count of fewer than 200 cells/mm 3 and/or with a baseline viral load (VL) of fewer than 400 copies/ml were first automatically characterized as having a LSI. For the remainder, the Aware™ BED™ EIA HIV-1 Incidence Test (BED test) (Calypte Biomedical Corporation, Portland, Oregon) was employed on a sample taken within 3 months of diagnosis, according to the manufacturer's instructions. Briefly, the principle of the BED test is as follows: patients' plasma samples were first diluted 1:101 and added to a goat-anti-human IgG coated microplate, capturing anti-HIV IgG and nonanti-HIV IgG from the samples. The amount of specific anti-HIV-1 antibodies was proportional to the optical density (OD) values obtained by spectrometric analysis. Normalized optical density was calculated (OD n = plasma sample OD/calibrator OD) and, on the basis of the defined OD cut-off values of the assay, the patients were classified as having a LSI or RI, with the suggested window period of infection being 155 days [13]. This determination was possible for 213/223 Slovenian patients included in the study (for the remaining 10 patients a plasma sample taken within 3 months after HIV-1 diagnosis was not available).
Sequences were re-analyzed for subtype determination by employing the REGA HIV-1 Automated Subtyping Tool, version 2 [14]. Only subtype B sequences were selected for this study, a total of 223 partial pol gene sequences, exhibiting an inclusion of 53% ± 15% of newly diagnosed patients in the study per year.
Alignments were made using ClustalW, available in the BioEdit package, and edited and trimmed to 953 base pairs [15]. A quick neighbor joining (NJ) tree was created using Seaview with 100 bootstrap replicates [16]. A subset of Slovenian sequences was selected from all parts of the NJ tree by visual inspection, to represent the complete diversity of the epidemic. Depending on the size of the cluster, at least three to five sequences per transmission cluster were included. These sequences were then used to search the GenBank public database for other closely related sequences (controls), using the BLAST search tool [17]. The 10 most similar sequences per Slovenian sequence were retrieved from GenBank. Since population genetics analysis is only possible using sequences that contain sampling time information, only sequences with an available sampling time were selected for controls. The majority of Slovenian sequences gave similar BLAST results, thus a fair amount of control sequences were found to be repeated. After removing these duplicate sequences, only 84 control sequences remained, together with Slovenian sequences forming a dataset of 307 subtype B sequences for further analysis.
jModeltest software was employed for selection of the best fitted evolutionary model to be used on the selected dataset, with an additional three subtype A1 sequences (accession numbers AB098332, AB253421, AB285785), selected to root the obtained maximum likelihood (ML) tree [18,19]. Using jModelTest, TVM + I + G was determined as the best fitted evolutionary model for the data. Since this evolutionary model is not available in most phylogenetic tree construction software, the selection of the closest model, based on the hierarchy of evolutionary models provided by jModeltest developers, was used instead. Thus, all additional analyses (maximum likelihood and Bayesian probability) were run using the next simpler model, HKY + I + G. This model, when combined with 2 codon partitions (1 + 2 codon, 3 codon), has been previously determined as performing best for most protein viral datasets [20]. The ML phylogenetic tree was constructed using a sub-tree pruning and regrafting + nearest neighbor interchange (SPR + NNI) search criterion, as implemented in PhyML 3.0 software [21]. The obtained phylogeny was visualized using FigTree v1.3.1 [22]. Transmission clusters were at this point identified according to the approximate likelihood ratio test (aLRT) branch support values obtained (>0.90) by the ML method.
The temporal signal of the dataset was assessed using Path-O-Gen, which showed an r-squared value of 0.53 with the best-fitting root (date range 29 years) [23].
The Bayesian analysis was performed on all Slovenian subtype B sequences and control sequences (full analysis). Additionally, separate analyses were executed on all major Slovenian clusters (≥10 Slovenian sequences) for two reasons: 1) to compare the tMRCA values obtained in the independent clusters analysis with the tMRCA values obtained based on the full data set analysis and 2) to obtain the population growth rate of each cluster. The tMRCAs of the full dataset and of major clusters were determined by the Monte Carlo Markov chain (MCMC) method available in the BEAST package v1.7.1, using a relaxed clock model with uncorrelated lognormal distribution and the Bayesian skyline coalescent model [24]. The analysis was run using a HKY + I + G substitution model and 2 codon partitions (1 + 2 codon, 3 codon) were added. The output results of BEAST were examined in Tracer v1. 5 [25] and the MCMC chain was run until the effective sample size (ESS) values for all parameters exceeded 200. Convergence was achieved after 500,000,000 and 100,000,000 generations, for the full analysis and for analyses based only on cluster strains, respectively. All the analyses were repeated and the obtained duplicate results then combined using LogCombiner v1.6.1, available in the BEAST package [24]. TreeAnnotator v1.7.1 (BEAST package) was employed to remove a burn-in of 10% of all sampled trees. A Maximum Clade Credibility Tree was finally visualized and annotated using FigTree v1.3.1 [22].
The final clusters were defined by reviewing clusters previously identified in the ML analysis (aLRT > 0.90) according to posterior probability values obtained in Bayesian analysis. Only clusters with posterior probability >0.990 were then selected. Thus, cluster definition relied on both aLRT (>0.90) and posterior probability (>0.990) values. Moreover, a sensitivity analysis was conducted by altering aLRT and posterior probability cut-offs. When a higher cut-off was applied, namely, aLRT >0.95 and posterior probability >0.999, all the defined clusters remained. On the other hand, when lowering the cut-offs to aLRT >0.85 and >0.80 one of the clusters had 3 additional Slovenian sequences and one additional Slovenian cluster was identified. Regarding posterior probability, no differences were observed when lowering the cut-off, since the values on the outer nodes were much lower than 0.9, giving credibility to the selected cluster definition.
The obtained tMRCA values were compared to epidemiological data (e.g., time of infection) gathered from questionnaires, confirmed epidemiological links and compared to the results of an HIV-1 incidence algorithm.
Statistical analyses were conducted using the on-line statistical package Epi Info™ Version 3.5.3 and P ≤ 0.05 values were considered to be significant [26].

Most Slovenian subtype B infected patients belong to large local MSM transmission clusters
The majority of patients included in the study were males (94.2%), with a mean age of 37.9 years and reporting homosexual contact (83.4%) and sex with anonymous person (52.5%) as the most probable route of HIV acquisition. Mean viral load of 4.99 (±0.979) log copies/ ml and CD4 cell count of 325 (±218) cells/mm 3 were measured at the time of diagnosis and 26.9% of patients were determined as recently infected by incidence algorithm.
According to the maximum likelihood phylogenetic tree, 8 clusters with ≥10 Slovenian sequences, 3 trios and 11 pairs of patients were identified with high support (aLRT values >0.90). Bayesian analysis was repeated in duplicate on the same set of sequences. Seven out of the 8 clusters obtained by the ML method were confirmed by the Bayesian analysis, according to posterior probability values (≥0.990). The cluster not confirmed in the further Bayesian analysis was Cluster 5, which was broken down into two smaller clusters harboring 10 and 4 patients. Branch support values for the 8 major clusters obtained in ML and Bayesian analysis can be seen in Table 1, together with maximum pairwise genetic distances (a criterion not employed for cluster identification). Alternating to different cluster cut-offs (aLRT from 0.8-0.95 and posterior probability from 0.800-0.999) did not drastically change the cluster composition. In addition, the observed transmission pairs and trios of Slovenian patients were also confirmed, with the exception of one trio, which did not reach an adequate posterior probability (0.8455). Eight major clusters (n ≥ 10 patients), 1 group of 4 patients, 2 trios and 12 transmission pairs were therefore finally identified by combining the results of maximum likelihood and the Bayesian method. Among 223 included individuals, 146 (65.5%) patients thus belonged to large transmission clusters comprising 10 or more individuals and 34 (15.2%) patients to small clusters of 2-4 patients. Thus, for only 43 (19.3%) of the Slovenian patients infected with subtype B no local epidemiological link was observed by phylogenetic inference (Table 2).
Statistical analysis examining the characteristics of patients found in large clusters (≥10 patients) revealed that significantly fewer patients were diagnosed prior to 2005 (p = 0.0388), significantly more patients reported Slovenia as the country in which the infection occurred (p < 0.0001) Table 1 Genetic distance, aLRT and posterior probability of 8 major clusters of Slovenian sequences and substitution rate (*10 -2 ) obtained by employing Bayesian analysis on two different sets of sequences; set comprised of all Slovenian subtype B sequences and separate sets of only clustered sequences The only characteristic significantly associated with patients having a determined local transmission link after employing a Bonferroni correction (at a significance level of <0.0033) was found to be Slovenia as the country in which the infection occurred.
The network of patients analyzed can also be seen in the obtained combined and annotated (10% burn-in) Bayesian phylogenetic trees (Figure 1). It can be seen that 6 out of 8 large clusters contained at least one patient reporting that he or she was infected abroad, which was also true for almost half of small clusters. All identified clusters harbored  GenBank control sequences were mostly paraphyletic to Slovenian clusters, indicating a potential geographical origin for those clusters (Figure 1). Sequences found most closely related to large clusters of Slovenian sequences were derived from USA, Germany, Belgium, Canada and France (data not shown).

The Slovenian HIV-1 epidemic started in the 1980s
For an estimation of the time to the most recent common ancestor (tMRCA) of Slovenian sequences, Bayesian analysis was performed on different datasets, always in duplicate. The 2 runs were combined and the obtained Bayesian phylogenetic trees are seen in Figure 1 and the corresponding tMRCA values of 8 major clusters in Table 3. According to mean values, the earliest dated cluster of Slovenian sequences is Cluster 6, dating the start of local transmission of the Slovenian HIV-1 epidemic to 1986 (95% highest posterior density (95% HPD): 1981.4-1990.5) ( Table 3). Separate MCMC runs were executed for each of the eight identified Slovenian transmission clusters (including the newly defined Cluster 5 with 10 individuals) for two reasons: 1) to compare the tMRCA values obtained in the independent clusters analysis with the tMRCA values obtained based on the full analysis previously described and 2) to obtain the population growth rate of each cluster. As seen in Table 3, the analysis reached convergence only for Clusters 1 and 2.
Additionally, substitution rates obtained in the full analysis were compared to rates obtained in independent analyses based on cluster strains (Table 1).

tMRCAs estimated by Bayesian analysis generally precede the estimated time of infection determined by the BED assay
Additional information of a reported transmission link was available for some patients included in this study. Together with information of recent infection provided by the incidence algorithm, it allows a fairly reliable estimate of the time of infection. The mentioned data were available for 2 pairs of patients, in which at least one of the two patients was determined as recently infected and one trio of patients, all infected recently ( Table 4). The BED assay, used to determine whether an infection was acquired recently, employs a 155-day time interval following infection for characterization of a recent infection. For this reason the time of infection for these individuals defined as being recently infected was thus estimated to be within a 155-day interval prior to the time of sampling. Since all the patients were part of previously identified clusters, tMRCA results were

Discussion
To the best of our knowledge, this is the first nationwide phylodynamics study examining the characteristics  of an HIV-1 subtype B driven epidemic on a population consisting of over half of all HIV-positive patients diagnosed in the 13-year study period. The HIV-1 epidemic in Slovenia is mainly driven by homosexual intercourse and through local transmission, where eighty percent of patients have an identified local transmission link (meaning either being infected by someone or infecting someone locally). Several introductions of HIV-1 to Slovenia have been observed after the year 1986. Interestingly, faster substitution rate of Slovenian sequences was seen, which suggests low transmission rates and a slowly growing epidemic.
The HIV-1 epidemic in Slovenia results from several introductions after 1986 which then propagated mainly through local transmissions In order to understand the dynamics of HIV-1 subtype B transmission in Slovenia, we analyzed a dataset including fifty-two percent (223/427) of all newly diagnosed patients over the 13-year time span between 2000 and 2012. Since MSM are the most vulnerable population for HIV infection in Slovenia, male gender was overrepresented (94% of all included patients). Indeed, a significant association between male gender and subtype B had already been determined in a previous study conducted in Slovenia [9]. The majority of infected patients belong to large transmission clusters: 65.5% of patients belonged to a transmission cluster together with 9 or more individuals. When including also smaller transmission clusters of 2-4 individuals, 80.7% Slovenian patients infected with a subtype B virus have a local transmission link. This is also the reason why few non-Slovenian sequences could be found when using the BLAST search tool to select for additional control sequences to be included in the analysis. This supports the findings that in Europe, infections are acquired predominantly from patients of the same country [27].
All the identified clusters were nested within subtype B control sequences selected from GenBank, indicating that the HIV epidemic in Slovenia is comprised of several introductions of HIV into the country. tMRCA values obtained from the analysis of the complete dataset indicate that HIV-1 subtype B was introduced into Slovenia at different time points from the late 80s onward. The earliest dated two clusters of Slovenian sequences show 1986 as the year in which HIV was introduced into the country. This was also the year when the first AIDS patient was diagnosed and reported in Slovenia, indicating consistency between epidemiological data and Bayesian tMRCA estimates. Even then, most infections were found among MSM, as they are today; this vulnerable risk group thus probably imported HIV into Slovenia at different time points.

The HIV-1 epidemic in Slovenia is mainly driven by homosexual intercourse and through local transmission
In the current study, three factors were found to be significantly associated with patients belonging to a large cluster: the fact that the patient reported Slovenia as the country in which HIV infection occurred, diagnosis of HIV after 2004 and no DRM detected. The first was expected, is consistent with the findings of large local transmission clusters as mentioned above and is in line with the study of Frentz et al. (2013) [27]. A possible explanation for the second is that the period around 2004 was when a substantial pool of HIV infected patients had formed in Slovenia, thus making HIV acquisition within the country, as opposed to abroad, more probable. On the other hand, more recent infections within clusters can be explained by the process of cluster identification. The definition of a transmission cluster is still a topic of discussion and no consensus has been reached. In any case, since divergence is quickly accumulating within a host after infection, and given the current proposed definitions of a transmission cluster, transmission clusters with a higher number of recent infections will be preferentially selected in detriment of the ones where infections were acquired over a longer time period. In such recent infections, less divergence has accumulated and therefore evolutionary distances and eventually statistical supports of the cluster (aLRT, bootstrap, posterior probability) will be higher. In this study, since evolutionary distances were not considered as a criterion for the definition of transmission clusters, we checked the age of the cluster by investigating its depth in years. Our mean (and median) depth in years were 18.6 (20.1), which is quite comparable with the mean (or median) depth in years found in other studies, ranging between 7 and 35 years [2,4,[28][29][30][31][32][33][34][35][36].
The third finding showing statistical significance was that most patients with DRMs were not included in transmission clusters, nor did they have a transmission link. This could imply that most transmitted drug resistance (TDR) is imported, confirming the particular importance of HIV-1 drug resistance testing of newlydiagnosed treatment-naive patients infected abroad, as is the current national strategy. Alternatively, it could indicate a lower fitness of viruses harboring TDR, with lower transmissibility of such strains.
Interestingly, 40% of patients reporting that infection happened abroad did not have a transmission link, compared to only 15% among those reporting Slovenia as their country of infection. This may be due to sampling bias, since sampling in other countries, where the infection was acquired, might not be as substantial. However, this leaves 60% with a local transmission link found, even though the infection was supposedly acquired abroad. This could mean that a large proportion of patients infected abroad further transmit the disease locally. On the other hand, it is possible that these patients were not certain of their source of infection and had actually been infected in Slovenia. Uncertainty in patients' reports of the most probable source of transmission, especially in populations that are sexually more active, has been observed in a study by Resik et al. (2007) examining two transmission networks in Cuba [37].
More than half of the large clusters were comprised of male individuals only and, furthermore, 2 clusters had individuals reporting only homosexual contact as their route of infection. However, the other 3 male-only clusters had reported heterosexual contact and, since no female individuals were found in these clusters, this could be explained by missing transmission links due to sampling or by individuals not disclosing their true sexual orientation due to stigma. The latter was described recently in a study by Hue et al. (2014), where a 1-11% misclassification of homosexuals as reported heterosexuals was noted [38]. Most of the heterosexual transmissions seem to be limited to small clusters, without further spread of the infection, with only a few in large clusters (possibly women infected by bisexual partners). All things considered, our results show that local HIV transmission in Slovenia is mainly driven by homosexual transmission clusters.
Discrepancies in the results obtained analyzing the complete dataset and smaller separate cluster datasets With the intention of testing tMRCA estimates based on Bayesian methods tMRCA for a particular clade was obtained by using two different calibration strategies. Firstly, the complete set of Slovenian sequences with corresponding control sequences drawn from GenBank were used for calibrating the dates; and, secondly, only cluster strains were used for calibration. For two of the clusters, the analysis using the entire dataset showed consistently earlier tMRCA estimates compared to the analysis using cluster strains only. For all other clusters, the analysis based on cluster sequences only did not reach convergence. The discrepancies in tMRCA found for the three clusters with both estimates may be caused by dramatic fluctuations of evolutionary rate, for example because of bottleneck events or the introduction of HIV in populations with different epidemic characteristics. It has been noted before that even the relaxed clock model with uncorrelated lognormal distribution, as was used here, cannot cope with extreme differences in rates across branches. Wertheim et al. conducted an analysis on more divergent data (HIV group M), examining tMRCAs of different HIV subtypes and concluded that heterotachy was responsible for this discrepancy. Relaxed clock analysis was able to detect rate changes in a separate clade, although the molecular clock model considerably underestimated the impact of this change. Using different approaches in order to clarify these obtained discrepancies, the authors concluded that none of the approaches was able to resolve these differences [39]. Our results further corroborate such findings. It was interesting to observe wider 95% HPD intervals of the obtained clusters' tMRCAs and substitution rates in the full analysis than in the analyses with only the cluster strains, since generally one would expect that more data generates more confident results.
For 95.5% of patients included in the study, results of an incidence algorithm characterizing patients as having a recent infection (RI) or a long-standing infection (LSI) were available. A window period of 155 days was set as a cut-off for differentiating RI from LSI, meaning that when a patient is characterized as having a RI, infection was acquired in an interval of 155 days before sampling. When comparing these "BED" estimated times of infection with the tMRCAs obtained from Bayesian analysis, we found that the obtained tMRCAs were estimated earlier in time in the full analysis, as well as in the separate analysis based on cluster sequences. One explanation for these earlier estimates relies on the definition of MRCA: the Most Recent Common Ancestor corresponds to the strain that gave origin to the infections of that cluster. Therefore, this strain should have originated before the estimated time of infection, inside the body of the patient who transmitted that strain to one of the patients in the cluster. As such, it is normal and to be expected that the estimated tMRCA will be earlier than the time of infection estimated by the BED algorithm. It therefore shows that tMRCA estimations by Bayesian methods for these clusters were credible. However, these results should be interpreted with caution, since the BED test used in the incidence algorithm does not allow individual determination of the timing of infection, due to variations in immune system response, so the stated intervals of the supposed timing of infection based on this data have limitations.
All in all, these results open discussion about the accuracy and interpretation of tMRCA estimates when analyzing large datasets that represent different epidemic settings, including bottleneck events and different transmission dynamics.
The faster substitution rate of Slovenian sequences suggests low transmission rates, a slowly growing epidemic consistent with the small HIV-1 effective population size in Slovenian clusters In the study of Abecasis et al. (2009), the substitution rate of the pol region of subtype B was estimated at 0.001 substitutions/site/year. When comparing this to the rate obtained for major clusters of Slovenian sequences, a more than 10-fold faster substitution rate of Slovenian sequences is seen [40]. This, together with the findings of a low epidemic growth rate, is in line with the observation that the evolutionary rate of HIV-1 slows down when the epidemic rate increases, e.g., in a slow epidemic with small numbers of transmissions, the evolutionary rate of the virus will be faster. The HIV epidemic is indeed still small in Slovenia, thus explaining and corroborating the finding of such a fast substitution rate in the Slovenian population of HIV-1 infected patients. It has been observed that if transmissions have occurred mostly from individuals in the early stage of infection, there is no major impact of the host immune system, so less selective pressure is applied and fewer mutations accumulate in the virus transmitted [41]. Taking this into account, this suggests that the Slovenian epidemic is probably therefore driven mostly by individuals with an established long-lasting infection.
In order to determine trends in the HIV epidemic in Slovenia, analyses reconstructing population growth were additionally carried out on the two major clusters of Slovenian HIV sequences. When observing the effective population size from the Cluster 1 analysis, a rise in the numbers was seen in 2003. Interestingly, in a previous analysis of the epidemic in 2006, sequences found in clusters were predominantly obtained from individuals infected during or after 2003. It indicates that this period was important for the spread of HIV within the country [10]. As already mentioned, this was confirmed in the present study, since patients diagnosed with HIV after 2004 were found significantly more often in a large cluster. The HIV incidence in Slovenia in 2005, in fact, was more than 3-fold higher than in 2003 (35 vs. 11 newly diagnosed patients per 1,000,000 inhabitants; P < 0.05) [42].

Conclusions
In conclusion, this is the first nation-wide phylodynamics study of a mainly HIV-1 subtype B driven epidemic including over half of all HIV-positive patients diagnosed in the 13-year study period. Our study is also unique since we compare the phylodynamic characteristics with incidence data based on HIV-1 diagnosis over time in the same population. We found various introductions of subtype B into the country from the late 80s onward. The majority of patients had a local transmission link, indicating a closed community. With respect to the faster evolutionary rate of Slovenian sequences, and its associated slower epidemic rate, we propose this is because individuals with a long-lasting infection but unaware of their status may be the driving force of the epidemic.