Characteristics and spread to the native population of HIV-1 non-B subtypes in two European countries with high migration rate

Background Non-B subtypes account for at least 50 % of HIV-1 infections diagnosed in Belgium and Luxembourg. They are considered to be acquired through heterosexual contacts and infect primarily individuals of foreign origin. Information on the extent to which non-B subtypes spread to the local population is incomplete. Methods Pol and env gene sequences were collected from 410 non-subtype B infections. Profound subtyping was performed using 5 subtyping tools and sequences of both pol and env. Demographic information, disease markers (viral load, CD4 count) and viral characteristics (co-receptor tropism) were compared between subtypes. Maximum likelihood phylogenetic trees were constructed and examined for clustering. Results The majority of non-B infections were diagnosed in patients originating from Africa (55.8 %), individuals born in Western Europe represented 30.5 %. Heterosexual transmission was the most frequently reported transmission route (79.9 %), MSM transmission accounted for 12.2 % and was significantly more frequently reported for Western Europeans (25.7 % versus 4.3 % for individuals originating from other regions; p < 0.001). Subtypes A and C and the circulating recombinant forms CRF01_AE and CRF02_AG were the most represented and were included in the comparative analysis. Native Western Europeans were underrepresented for subtype A (14.5 %) and overrepresented for CRF01_AE (38.6 %). The frequency of MSM transmission was the highest for CRF01_AE (18.2 %) and the lowest for subtype A (0 %). No differences in age, gender, viral load or CD4 count were observed. Prevalence of CXCR4-use differed between subtypes but largely depended on the tropism prediction algorithm applied. Indications for novel intersubtype recombinants were found in 20 patients (6.3 %). Phylogenetic analysis revealed only few and small clusters of local transmission but could document one cluster of CRF02_AG transmission among Belgian MSM. Conclusions The extent to which non-B subtypes spread in the native Belgian-Luxembourg population is higher than expected, with 30.5 % of the non-B infections diagnosed in native Western Europeans. These infections resulted from hetero- as well as homosexual transmission. Introduction of non-B variants in the local high at risk population of MSM may lead to new sub-epidemics and/or increased genetic variability and is an evolution that needs to be closely monitored.


Background
High virus replication rates and error prone reverse transcription resulted in the genetic diversification of the main HIV-1 group M into 9 subtypes: A, B, C, D, F, G, H, J and K [1]. The possibility of recombination further fuels genetic variability. Amongst the constantly increasing number of circulating recombinant forms (CRF), CRF01_AE and CRF02_AG have been the most successful (http://www.hiv.lanl.gov/content/sequence/HIV/CRFs/ CRFs.html; last accessed on December 4 th 2014) [2]. Between 2004 and 2007, the subtypes C and A, together with the recombinants CRF01_AE and CRF02_AG were responsible for 73 % of all infections worldwide [2]. Since the beginning of the HIV epidemic, non-B infections accounted for up to 50 % of all new diagnoses in Belgium and Luxembourg [3,4]. Today, the proportion of non-B infections in Luxembourg may reach 62 % [5]. Non-B infections are mainly diagnosed in individuals of foreign origin and in general acquired through heterosexual contact. This contrasts with the epidemic in the local population that is almost exclusively driven by men having sex with men (MSM) infected with subtype B virus. Recent analysis of transmission dynamics in Luxembourg however provided evidence for an increasing number of established non-B transmissions in the country [5]. Subtype specific differences in disease progression, transmission efficiency or susceptibility to antiretroviral drugs have been reported but there still is a lack of consistency in these data [6,7].
Adequate subtyping is essential when studying the distribution and characteristics of the different non-B subtypes. While the initial classification of HIV-1 into subtypes was based on full length sequences most epidemiological and clinical studies today, for practical reasons, use partial sequences for subtyping. Polymerase (pol) gene sequences are preferably used for that purpose because they are widely available through routine drug resistance screening and are considered to contain sufficient phylogenetic signal for subtype assignment [8]. Subtype classification based on one genomic region however, may miss intersubtype recombination [9]. Single region subtyping therefore can introduce bias, especially when studying viral characteristics not defined by the pol gene such as co-receptor use. The preference of HIV-1 for one of the two co-receptors, CCR5 and CXCR4, is determined by a small variable region (V3) of the envelope (env) gene. The prevalence of CXCR4-use seems to differ between subtypes with low CXCR4-use reported for subtype C and high CXCR4-use for subtype D and CRF01_AE [10][11][12][13][14][15][16][17][18]. In subtype B, the ability to use CXCR4 has been associated with faster disease progression [19,20]. For the other subtypes, the potential association between co-receptor use and disease progression is less well documented.
The study presented here is a retrospective analysis comparing the characteristics of non-B HIV-1 infections in two countries with low HIV prevalence but with a high diversity of subtypes [3,4,21]. Extended subtyping of 410 non-B infections was performed and patient demographics, HIV transmission route, co-receptor use, viral load and CD4 count were compared between subtypes. The extent to which the non-B subtypes spread in the native population was defined.

Patients and sequences
HIV-1 pol sequences (857 nucleotides (nt) long concatenated sequence fragments comprising codon 4 to 99 of the Protease gene and codon 30 to 226 of the Reverse transcriptase gene), were collected from 410 non-B subtype HIV-1 infected individuals diagnosed at two sites in Belgium (Ghent, 277 patients; Antwerp, 82 patients) and one site in Luxembourg (51 patients) between 2000 and 2012. The pol sequences were obtained as part of routine clinical care for baseline or pretreatment resistance analysis. Initial identification of non-B infection was based on the fast subtyping tool implemented in the Smartgene IDNS database system (Smartgene IDNS, Zug, Switzerland). Selected patients had to be treatment naïve and have sufficient left over plasma available for env sequencing. Env sequences were generated following the procedures for RNA extraction, amplification and Sanger sequencing as described before [22]. This procedure generates an amplicon of about 1106 nt long comprising V1 to V4. The short V3 sequence, with a length of 105 nt, could be generated for all 410 samples but sequencing of a longer env fragment (of at least 300 nt) was successful for only 337 samples due to high sequence variability and the abundant presence of indels in this region. Short V3 sequences were used for co-receptor tropism prediction only, the longer sequences were also used for subtype assignment. Demographic, epidemiological and clinical data were collected from the patients' records, including gender, age, origin, most probable route of transmission, CD4 count and viral load.

Ethics statement
The study was reviewed and approved by the Ethics Committee of Ghent University Hospital (as central committee; study number 2011/162), the Institute of Tropical Medicine Antwerp and the Comité National d'Ethique pour la Recherche in Luxembourg.

Subtyping
Subtyping was performed using a combination of 5 subtyping tools: Rega v3, http://regatools.med.kuleuven.be/ typing/v3/hiv/typingtool (Rega Institute for Medical Research, Leuven, Belgium) [23]; Comet, http://comet.retrovirology.lu/ (Laboratory of Retrovirology, Luxembourg Institute of Health, Luxembourg) [21]; SCUEAL, http:// www.datamonkey.org/dataupload_scueal.php (University of California San Diego, La Jolla, California, US) [24]; jpHMM, http://jphmm.gobics.de/submission_hiv (Institute of Microbiology and Genetics, University of Göttingen, Germany) [25] and NCBI, http://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi (National Center for Biotechnology Information, Bethesda, MD) [26]. To classify pol sequences, Comet and Rega were used initially and the subtype was attributed in case of concordant results. Sequences with discordant results or for which the subtype remained unassigned by at least one of the two tools were additionally submitted to SCUEAL and jpHMM. The final subtype was attributed in case of concordance for at least 3 tools. All other sequences were considered as unassigned. The same strategy was used for env subtyping but, given that the SCUEAL tool can only handle pol sequences it was replaced by NCBI.

Phylogenetic analysis
Pol sequences and env sequences >300 bp, were aligned in the BioEdit Sequence Alignment Editor Version 7.0.9. For subtype verification, Los Alamos 2010 subtype and CRF reference sequences with exclusion of the subtype B and group P references (http://www.hiv.lanl.gov/cgi-bin/ NEWALIGN/align.cgi; last accessed on December 5 th 2014) were added to the alignment. For examination of origin-specific and transmission-route specific clustering the subtype references were excluded. The HXB2 sequence was used as outgroup in all trees. Maximum likelihood trees were generated using the 'best fit' evolutionary model selected by jModeltest version 2.1.7 [31] and the PhyML software package with approximate likelihood ratio test (aLRT) for branch support [32]. Tree visualization and editing was done with Itol v2.2.2 [33]. Clusters of presumed recent transmission were identified with the automated cluster selection tool developed recently by Ragonnet-Cronin et al. [34] using as thresholds aLRT >0.9 and mean genetic distance < 0.045.

Statistical analyses
Statistical analyses were performed using SPSS 22.0 software (IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp). Bivariate analyses used the chi-square test or Fisher exact test (if more than 20 % of the cells had expected counts less than 5) for categorical variables and the Mann-Whitney U nonparametric test for continuous variables. The level of significance was set at p < 0.05. P-values were calculated to assess whether the analyzed parameters showed subtype specific differences. Characteristics of individual subtypes were compared to the pool of all other subtypes.

Phylogenetic and cluster analysis
Manual examination of the pol phylogenetic tree (Fig. 1) showed marked subtype specific clustering. A large number of the pol sequences for which the subtyping tools failed were localized in the subtype A cluster or in the CRF02_AG cluster (respectively 16 and 9 of the 49). Sequences of individuals with Belgian or Luxembourg origin were scattered over the tree. They were however in the majority in the small subtype F cluster. An important representation of Western Europeans, mainly originating from Portugal was also observed for subtype G. In the large and very branched subtype A cluster, a marked clustering (aLRT = 1.0) of sequences from Eastern Europeans, many of whom reporting IVDU transmission, was identified ( Fig. 1, cluster 1).
Using stringent cluster selection criteria (aLRT >0.90, mean genetic distance <0.045 and cluster composed of at least 3 individuals) the pol tree was further examined for indications of recent transmission events. Seven transmission clusters were identified; 5 clusters of 3 individuals, 1 of 4 individuals and 1 of 6 individuals. Five of these 7 clusters were also identified in the env tree (Fig. 2) although for some the env cluster size was smaller because not all individuals had an env sequence of >300 bp available. Six of the 7 transmission clusters resulted from heterosexual transmission. They were clusters of CRF01_AE (cluster 2 and 3 in Figs. 1 and 2), subtype A (cluster 4), CRF06_cpx (cluster 5), novel A/B recombinant (cluster 6) and unassigned sequences (cluster 7). The largest cluster (cluster 8) was a CRF02_AG cluster of 6 individuals of whom 5 reported MSM as transmission risk and 5 were of Western European origin.
Subtype related differences in patient demographics, viral load, CD4 count, co-receptor tropism and transmission route To ensure sufficient sample numbers for statistical analysis, only the subtypes A and C and the CRF01_AE and CRF02_AG were considered as separate groups, other subtypes and CRF were pooled and sequences with unassigned subtype were excluded. The characteristics of the patients grouped after pol subtyping are shown in Table 2. No differences were observed in age or gender distribution, viral load or CD4 counts. Significant differences in the origin of patients were recorded: patients originating from Central and Western Africa and from Eastern Europe were more frequently infected with subtype A, individuals from Western Africa with CRF02_AG, individuals from Asia with CRF01_AE and individuals     from Central, East and South Africa with subtype C. Native Western Europeans accounted for between 16.2 % and 47.0 % of all non-B infections. They were underrepresented in subtype A (p = 0.008) and overrepresented in CRF01_AE (p = 0.001) and in the pooled group of other subtypes and CRFs (p = 0.016). MSM as route of transmission was the most frequently recorded for CRF01_AE (18.2 %; p = 0.011) but was never reported for subtype A (0 %; p = 0.003). IVDU transmission was only reported by patients originating from Eastern Europe and all but one were subtype A infections. CXCR4-using viruses were significantly more prevalent in CRF01_AE (p < 0.001) and less prevalent in subtype C (p = 0.014) when using G2P or PSSM but not when using Phenoseq. The Phenoseq prediction system on the other hand reported a significantly lower CXCR4-use prevalence in subtype A (p = 0.022). The analysis presented in Table 2 was repeated after grouping the patients based on env subtyping. The outcome was comparable (results not shown).

Characteristics of the non-B infections in the native population
The majority (91.6 %) of the Western-Europeans in the cohort were of Belgian or Luxembourg origin. The subtype distribution amongst this native population was CRF01_AE

Discussion
Several Western European countries, especially those with large sub-Saharan African immigrant communities, have a high burden of HIV-1 non-B infections. The vast majority of these infections are acquired through heterosexual contacts. This contrasts with the subtype B epidemic in the same countries, that is driven predominantly by MSM and affects primarily young white men [35,36]. Increasing frequencies of non-B infections over time have been reported for Belgium [4], Spain [37], Italy [38], France [39], Sweden [40,41] and Luxembourg [21]. Although information on the extent to which the subtype B and non-B epidemics intermingle remains limited, ascending prevalence's of non-B subtypes in native Europeans have been reported [21,37,41].
For this study we selected 3 sites in Belgium and Luxembourg, two small countries with a high migration rate, for investigation of the non-B infections. Although the study is biased by the random selection of non-B infections and the unequal distribution of patients across the territory and can therefore not be seen as representative for the general epidemic in these countries, it does provide some interesting insights in the dynamics of the non-B infections. Important strengths of the study are the large sample size, the high subtype and CRF heterogeneity and the availability of both pol and env sequences for subtype assignment.
A higher than expected percentage of patients with non-B infections were of Western European origin (30.7 %; 24.5 % from Belgium or Luxembourg and 5.6 % from other Western European countries). While heterosexual transmission accounted for the majority (79.3 %) of infections a significant proportion of the patients reported MSM transmission (13.3 %). MSM infections were more frequent in Western Europeans (28.6 % versus 5.1 % non-Western Europeans; p < 0.001) providing proof for the spread of non-B subtypes to the native high-at-risk MSM population. One MSM transmission cluster of CRF02_AG was identified but this was the only clear phylogenetic indication of local non-B transmission in MSM. CRF01_AE was the most represented non-B subtype in Western Europeans (38.6 %), followed by CRF02_AG (27.7 %), C (19.3 %) and A (14.5 %). The higher prevalence of CRF01_AE in individuals born in Western Europe can have several reasons. The Philippines and Thailand, where CRF01_AE is the major subtype [42] are in the top 5 of bride-donorcountries of Belgium (http://www.esf-agentschap.be/sites/ default/files/attachments/articles/eindrapport_partnermigratie.pdf ). Sex tourism to South-East Asia may also have triggered the presence of CRF01_AE, in heterosexuals as well as MSM. MSM were significantly more represented amongst the CRF01_AE infections compared to the other subtypes. The profound intermingling of the CRF01_AE sequences from native Western Europeans and Asians and the lack of clear indications for onward local CRF01_AE transmission apart from some paired transmissions suggest that most CRF01_AE infections are imported infections.
Infection through IVDU was reported infrequently and only in Eastern-European individuals. The phylogenetic analysis revealed a very close genetic relationship between the viruses isolated from these Eastern-European IVDU but no indications for a spread of these particular strains to the local population.
Weakening of the association between subtypes and patient origin as well as between subtypes and exposure group has been reported before for the UK, Spain, Italy and Sweden [41,[43][44][45][46]. Future surveillance of this evolution will be important because this tendency can have implications on many facets of HIV care and prevention. Introduction of non-B subtypes in native high-at-risk populations may increase the overall genetic heterogeneity and the frequency of intersubtype recombination. Recombination may occasionally result in viral variants with increased pathogenesis as demonstrated recently in Cuba [47]. In our study, indications for novel intersubtype recombination were found in 6.3 % of the patients and a high contribution of non-B/B recombinants was noticed (5 of the 20 recombinants). This number of novel non-B/B recombinants is almost certainly an underestimation as the initial selection of non-B infections was based on the pol subtype and excluded infections classified as subtype B in pol. High percentages of novel recombinants have been reported for Spain (13.4 %) [48], Italy (9.4) [38] and UK (9.9 %) [49]. In these studies the classification of novel recombination is based only on the pol sequence. In our study however, 25 of the 49 pol sequences considered as having a complex mosaic genetic pattern, were assigned A-like in env and classified as subtype A by pol phylogenetic analysis. Adequate classification of subtype A and CRF02_AG pol sequences seems particularly challenging because of the overall high genetic variability within these subtypes. This apparently may lead to a false identification of novel recombinants and may partly explain the important differences in frequency of novel recombination between studies, a hypothesis that needs to be confirmed. Local spread of these novel recombinants was still limited, only one small transmission cluster of a novel A/B recombinant was observed.
The patients infected with the most represented subtypes, A, C, CRF01_AE and CRF02_AG, did not differ in age, gender or CD4 count. The significant differences in distribution of patient origin followed the geographic distribution pattern of the respective subtypes [2].
The higher prevalence of CXCR4-use in CRF01_AE when using the G2P or PSSM prediction algorithms and the lower prevalence of CXCR4-use in subtype C confirmed previous findings [10,11,14,50,51]. It is known however that G2P and PSSM lack accuracy for non-B subtypes [14,52,53]. Recently, an algorithm has been developed that claims high sensitivity and specificity for the subtypes A, B, C, D, CRF01_AE and CRF02_AG [30]. When applying this algorithm the prevalences of CXCR4use for the subtypes A and CRF01_AE differed considerably from the ones recorded with G2P and PSSM. These observations show that problems with co-receptor tropism prediction of some non-B subtypes are still unresolved and that urgent actions for improvements are needed.
We were unable to define subtype related differences in viral load or CD4 count but have to acknowledge that it is a shortcoming of this study that viral load and CD4 data were based on single measurement and could not be corrected for time of infection.

Conclusion
A higher than expected proportion of the non-B infections sampled in Belgium and Luxembourg were diagnosed in individuals of Western European origin. The spread of African and Asian subtypes to the local population resulted from heterosexual as well as MSM transmission.
Local spread remains limited. Subtype related differences in patient origin, infection route and CXCR4-use were noticed but no differences in gender, age, CD4 count or viral load were found. Future monitoring of the introduction of non-B infections in the local population is warranted both from an epidemiological and a prevention perspective.

Availability of supporting data
Sequences were submitted to the National Center for Biotechnology Information (NCBI) Genbank and are available under accession numbers KT863537 to KT863 946 (protease sequences), KT863947 to KT864356 (reverse transcriptase sequences) and KT864357 to KT864 674 (envelope sequences).

Competing interests
The authors declare that they have no competing interests.