Codon pairs of the HIV-1 vif gene correlate with CD4+ T cell count

Background The human APOBEC3G (A3G) protein activity is associated with innate immunity against HIV-1 by inducing high rates of guanosines to adenosines (G-to-A) mutations (viz., hypermutation) in the viral DNA. If hypermutation is not enough to disrupt the reading frames of viral genes, it may likely increase the HIV-1 diversity. To counteract host innate immunity HIV-1 encodes the Vif protein that binds A3G protein and form complexes to be degraded by cellular proteolysis. Methods Here we studied the pattern of substitutions in the vif gene and its association with clinical status of HIV-1 infected individuals. To perform the study, unique vif gene sequences were generated from 400 antiretroviral-naïve individuals. Results The codon pairs: 78–154, 85–154, 101–157, 105–157, and 105–176 of vif gene were associated with CD4+ T cell count lower than 500 cells per mm3. Some of these codons were located in the 81LGQGVSIEW89 region and within the BC-Box. We also identified codons under positive selection clustered in the N-terminal region of Vif protein, between 21WKSLVK26 and 40YRHHY44 regions (i.e., 31, 33, 37, 39), within the BC-Box (i.e., 155, 159) and the Cullin5-Box (i.e., 168) of vif gene. All these regions are involved in the Vif-induced degradation of A3G/F complexes and the N-terminal of Vif protein binds to viral and cellular RNA. Conclusions Adaptive evolution of vif gene was mostly to optimize viral RNA binding and A3G/F recognition. Additionally, since there is not a fully resolved structure of the Vif protein, codon pairs associated with CD4+ T cell count may elucidate key regions that interact with host cell factors. Here we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.


Background
The APOBEC (apolipoprotein B mRNA-editing catalytic polypeptide) gene family includes several members, APOBEC1, APOBEC2, APOBEC3 and APOBEC4 that have cytidine deaminase activity [1][2][3]. Notably, two genes of APOBEC3 (APOBEC3G and APOBEC3F) have been linked to innate immunity, and their ability to restrain retroviral infections has been widely recognized [4,5]. APOBEC3G (A3G) induces cytidine deamination (C→U) in the negative strand of HIV-1 during reverse transcription, hence inducing substitutions of guanosines for adenosines (G→A) in the positive strand of the viral DNA. This mechanism is known as hypermutation and may cause the appearance of stop codons followed by the complete loss of reading frames of the viral genes. HIV-1 counteracts A3G activity through ubiquitination of this host protein through the activity of Vif proteins. Vif proteins assemble with viral-specific E3 ubiquitin ligase through its interaction with cellular Cullin5 (Cul5)-ElonginB-ElonginC proteins, inducing ubiquitination of A3G and consequent degradation by the proteasomal complex [6][7][8][9][10]. Hypermutation is not enough to curb HIV-1 infection because proviruses with varying amounts of G→A mutations are commonly observed in the host genome [11][12][13]. Furthermore, the polymorphisms in the vif gene have been associated with more or less efficacy to neutralize A3G [5]. For these reasons, it has been hypothesized that when G→A mutations are ineffective in neutralizing viral genomes, the side effect is that A3G can actually promote HIV-1 diversification [14,15].
The interaction between the cellular A3G and the vif gene of HIV likely emerged from a process of coadaptation due to recurrent retroviral infections during the evolutionary history of primates [2,16]. Specifically, the repeated encounters with retroviruses probably promoted the fixation of the allelic variants in the genes of the human family of the APOBEC [17][18][19][20].
Recently, we found that A3G polymorphisms are mostly unrelated with CD4+ T cell counts of HIV-1 infected Brazilians [21]. Thus, population-based studies may provide conflicting results regarding the overall effect of A3G-vif interactions [5,[21][22][23]. To gain more insights on the A3G-HIV interaction, we used codon-based approaches to determine the function of amino acid substitutions of Vif protein. The study was made through the analysis of 400 vif gene sequences obtained from HIV-1 infected drug-naïve individuals.

HIV-1 infected individuals
This study was approved by the Ethics Committee of the Federal University of São Paulo and by the Brazilian Ministry Health; all biological samples were obtained in full accordance with signed informed consent forms.

DNA samples
Proviral DNA was extracted from heparinized peripheral blood obtained from 400 HIV-1 infected individuals that were drug-naïve (not receiving any antiretroviral therapy) and asymptomatic when samples were collected. From each patient one unique sequence of HIV-1 vif gene was generated, then our study focused on the diversity of the virus at the population level. The study group represented almost equally the male (55.5%) and female (45.5%) populations and was composed of three ethnics groups: white (49.2%), mulatto (41.6%) and black (9.2%) individuals. The CD4 counts (cells/mm 3 ) ranged from 20 to 5362 and the virus load ranged from 80 to 7.8 x 10 7 (RNA copies/ml of plasma). HIV-1-infected individuals sampled from São Paulo city between 1989 and 2006 comprised our target population. These individuals were enrolled in the AIDS program of the Brazilian Ministry of Health.

PCR and sequencing of the vif gene of HIV-1
The vif sequence was amplified by a nested PCR. The primers were designed to cover the entire vif gene, according to the reference sequence HXB2 (HIV Sequence Database). The first round was performed with the primers, Platinum Taq DNA Polymerase, 10X Reaction Buffer, MgCl2 (Invitrogen, USA) and deoxyribonucleotide triphosphates (dNTP; GE Healthcare, USA). The second-round PCR was carried out using 5 μl of the first-round product and internal primers. Amplified vif DNA was purified and then sequenced using the BigDye Terminator kit, version 3.1 (Applied Biosystems/Perkin Elmer, Foster City, CA). The samples were electrophoresed on an ABI 3130 genetic analyzer, and the sequencing data were analyzed using ABI software Sequencing Analysis Software.
Sequences Analysis. Nucleotide and protein sequence analyses and edits were performed using the Sequencher DNA Sequence Assembly Software (Gene Codes Corporation, USA).

Hypermutation detection in the integrase gene of HIV-1
We used a previously described approach to detect the presence of hypermutation in the PCR products of the integrase gene of HIV-1 [24]. Briefly, the PCR products were initially analyzed on 1% agarose gels to confirm amplification. After that, a second electrophoresis was performed with HA yellow (9 μL/mL) incorporated into the agarose gel solution at 65°C and pH 7.5. The electrophoresis was performed at 80 V in 0.5× Tris-borate-EDTA (TBE) for 150 min. The HA yellow gel was visualized after immersion in a solution of ethidium bromide, using the Geldoc-it TS Imaging Systems BioImaging (UVP, Cambridge, CA, EUA). HA yellow is a compound consisting of the DNA ligand, bisbenzamide, covalently linked to polyethylene glycol (PEG) (Resolve-It Kit -Vector Laboratories, Burlingname, CA, USA). Bisbenzamide binds preferentially to AT-rich regions in the DNA and, when coupled to PEG, retards DNA mobility during gel electrophoresis according to the AT content. We used three distinct samples that independently amplified as negative (no hypermutation) and positive controls (hypermutated). The hypermutation statuses of the controls were confirmed by bacterial cloning followed by sequencing.

Sequence alignment and phylogenetic inference
Initially, the sequences of the vif gene of HIV-1 were aligned using the ClustalX program [25]. Sequences with stop codons and hypermutations were excluded from the analyses. We used the Hypermut software (http://www.hiv.lanl.gov/content/sequence/HYPERMUT/ hypermut.html).
After this editing process, the sequences were manually aligned using the SE-AL program, version 2.0 (Department of Zoology, Oxford University; http://evolve.zoo.ox.ac.uk/ software/). To construct maximum likelihood (ML) trees, we used the HKY model [26], as implemented in the PhyML software [27]. These trees were used mainly to the selective regimen analysis.

Association of HIV vif gene and CD4+ cell counts
We investigated whether individual codons or pairs of codons in vif gene were associated with levels of CD4+ T cell counts. To do that linear regression and permutation tests were used. The log-transformed CD4 counts were regressed on the amino acids or amino acid pairs. To account for multiplicity, we generated 1000 sets of samples under the null hypothesis of no association by permuting the CD4+T counts. The p-values obtained by the log likelihood ratio statistics were contrasted with the null distribution of minimum p-values among amino acid positions with SNPs and pairs of these positions.

Covariation among codons based on phylogenies
A Bayesian Graph method (BGM) was used to explore covariation among amino acids in codons of the vif gene taking into account the phylogenetic information of the sequences [28]. Therefore, BGM considers the potential bias due to the founder effect and relaxes the assumption of pairwise associations. BMG reconstructs the maximum likelihood of evolutionary history of the extant sequences, and then it analyzes the joint probability distribution of substitution events among sites in the sequences through a Bayesian graph model. The method was used to detect co-evolving sites in vif. The analyses were performed assuming a GTR model [29], and sites with a marginal posterior probability of 0.85 were considered to be under epistasis. The analysis was performed on the Datamonkey web server (http://www.datamonkey.org).

Detection of selective pressure
We used a codon-based maximum likelihood method to estimate the selection pressures of the vif sequences. This approach estimates the likelihood of distinct models of codon evolution and computes the ratio (d N /d S =ω) of the number of nonsynonymous (d N ) and synonymous (d S ) substitution rates between sites considering the phylogenetic relationships of the sequences.
The nonsynonymous/synonymous rate ratio (ω) determines selective pressures at protein level. When selection (neutral) has no effect on the fitness the nonsynonymous and the synonymous mutations will occur at the same rates (d N =d S ).
Situations where nonsynonymous mutations are deleterious, negative (purifying) selection will reduce their rate of fixation (d N <d S ). If nonsynonymous mutations raise the fitness, their rate will be increased by positive selection (d N >d S ).
We used the following codon models. The one-ratio model (M0) assumes a single ω for all sites in the alignment and is the simplest model. The neutral model (M1) allows for different proportions of conserved sites (ω 0 =0) and neutral sites (ω 1 =1), both estimated from the data. Model 1 is the null hypothesis to test for positive selection. The selection model (M2) extends M1 and incorporates an additional class of sites with ω ratios assuming values higher than one (ω 2 > 1). Significant evidence for positive selection is provided if M2 significantly reject the null hypotheses, M0 and M1, and if the favored models contain a class of codons with ω > 1. Statistical significance can be compared using a standard likelihood ratio test (LRT). These models are implemented in the CODEML program from the PAML v.4 package (http://abacus.gene.ucl.ac.uk/software/paml.html) [30].

Diversity of vif gene
To characterize the sequences of the HIV-1 vif gene from Brazilians, we analyzed the nucleotide and amino acid substitutions on a site-by-site basis. The overall nucleotide distance of 235 subtype B isolates in the alignment of 581 nucleotides was 0.031±0.004. The translated Vif sequence of 192 amino acids identified 22 singletons, 53 conserved and 138 variable sites. In general the amino acid composition was relatively conserved among subtypes in Brazil and all regions with biological functions were equally conserved. The genetic diversity was estimated assuming the HKY85 model and the analysis were performed using Mega 4.0 software [31].

Pairs of codons in vif gene associated with CD4+ cells
The regression analysis indicated that no single amino acid positions in vif gene were significantly associated with the CD4+T counts. However, when we analyzed the impact of pairs of codons in the levels of CD4+T cells, the epistatic effects of five pairs of amino acids (i.e., 78-154, 85-154, 101-157, 105-157, and 105-176 pairs) were detected at a 5% significance level after correction for multiplicity (orange dashed lines in the Figure 1). Notably, most combinations of amino acids in these epistatic sites tend to be associated with CD4+ T cell counts below 500 cells per mm 3 (see Figure 2 for a detailed description of pairs of residues and their correlation with CD4 counts). We used a proposed three-dimensional computational model of Vif [32] (PDB: 1VZF) to shown the location of the pairs of epistatic codons on the structure of this viral protein (Figure 3).

Coevolving sites in the vif gene
By using a posterior probability of 0.85, the BGM analysis detected three pairs of codons (i.e., 80-83, 80-86 and 83-144) where amino acids were coevolving in vif gene. These sites were not the same epistatic sites identified by regression/permutation, although they were concentrated in a specific genomic region of vif between sites 78 to 86 and within the BC-box (see blue dashed lines in the Figure 1 and magenta dotted lines in the Figure 3). However when we reduced the threshold of the posterior distribution to 0.5, various other sites were indicated to be under epistasis, including those identified by the permutation analysis.

Adaptive mutations in the vif gene
To explore the selective regimen acting on the vif gene of the subtype B lineage of Brazilian isolates, we used a codon-based model to estimate the dN/dS ratio. Recombination affects the reliability of likelihood ratio test (LRT) to discriminate models of positive selection [33], then we decided to analyze selective forces in vif sequences that have not recombined. To do that we used a Bayesian approach [34] to identify recombination-free sequences. These sequences were compiled in a data set composed of seventy-one (n=71) recombination-free isolates that were edited in order to exclude sequences with  Figure 1). In addition, a dataset of vif sequences (n=33) from hypermutated viruses, based on the integrase gene, was analyzed to explore the selective regime. According to the M2, most sites (53.6%) evolved under purifying selection with ω=0.035, 37.2% were conserved (ω=1) codons, and 9.2% were under strong positive selection with ω=4.063. These sites under positive selection were exactly the same as those detected in the recombination-free data (open diamonds in the Figure 3). Mapping positively selected sites in the Vif protein sequence and in the 3D structure of a computational model of Vif protein [32] (PDB: 1VZF), revealed they were concentrated between the 21 WKSLVK 26 and 40 YRHHY 44 motif (i.e., 31, 33, 37, 39 and 47), both important to Vif-induced degradation of A3G/F complexes. It is important to mention that the N-terminal region of Vif protein binds selectively to HIV-1 genomic RNA [35] and mRNA of A3G [36]. In addition, within the BC-Box and Cullin5-Box, we also detected codons under high selective pressure (i.e., 155, 159 and 168) (Figure 1 and Figure 4). Positively selected sites may indicate adaptive substitutions that usually evolve as a consequence of the host immune response against viral proteins. According to the Los Alamos HIV immunology database, cytotoxic T-lymphocyte (CTL) epitopes have been previously identified in nearly all regions of Vif protein (http://www.hiv.lanl.gov/content/immunology/maps/maps. html/). We then concatenated these CTL epitopes to show them in the consensus sequence of vif gene (dotted line in the Figure 1). Our results indicate that the codons under positive selection were not always located within the CTL epitopes, whereas other CTL regions were lacking positively selected codons. Indeed, signatures of positive selection by CTL or antibody immune pressure are a host-specific mechanism that is rarely identified by population-based analyses [37][38][39].

Discussion
While hypermutation induced by A3G activity is a natural barrier against retroviruses it is not enough to restrain HIV-1 infection. Sometimes, A3G activity can actually increase HIV-1 diversification [14,15] because G-to-A hypermutation is not always effective to neutralize all viral genomes within a specific host. Our results suggest that the diversity in the HIV-1 vif gene is highly associated with adaptation to the A C B Figure  host proteins, mainly to increase interaction with cellular components (i.e., elongins and A3G and A3F) to induce APOBEC3 proteasomal degradation. Particularly, codons under positive selection were more concentrated in a region between the 21 WKSLVK 26 and 40 YRHHY 44 motifs (i.e., 31, 33, 37 and 39). Interestingly, the N-terminal region of Vif protein binds selectively HIV-1 genomic RNA [1] and sites in this region have DNA/RNA binding properties and also interact with A3G/ F [35,36]. Additionally, a study showed that the charge of amino acids located between 21 WKSLVK 26 and 40 YRHHY 44 motifs that are essential for maintaining the ability of vif to bind A3G [40]. Furthermore, it has been shown that the N-terminal region of Vif protein is highly structured, in contrast to the unstructured and flexible C-terminal [41][42][43]. Likely, the organized N-terminal structure of Vif functions as a connector that binds to A3G/F proteins and DNA/DNA molecules. On the other hand, positively selected sites detected in the C-terminal region of Vif protein were more dispensed. They were found within the BC-Box and Cullin-Box (i.e., 127), which both assemble with cellular components to induce A3G proteasomal degradation [6,[43][44][45]. Positive selection was also detected in the vicinity of the PPLP motif (i.e., 159), which controls multimerization of Vif proteins [46]. We also found one codon under positive selection (i.e., 168) in a region of Vif protein involved in the interaction with Gag, NCp7 and with the cellular membrane [47]. It is worth to note that in the N-terminal region of Vif protein sites under positive selection are clustered between the 21 WKSLVK 26 and 40 YRHHY 44 motifs whereas in the C-terminal they tend to be dispersed (see Figure 1). Since Vif protein is highly structured at the N-terminal region, contrasting with the unstructured C-terminal [41,42]. Therefore the N-terminal portion of Vif protein tends to be more protected while the C-terminal is solvent exposed and prone to immune recognition. Consequently adaptive evolution in vif gene could be related with the host immune surveillance against viral proteins. However, there are various positively selected sites outside CTL epitope regions. Additionally, wide vif sequence intervals in which many CTL epitopes have been empirically detected show no evidence of positive selection. Furthermore, selection driven by antibody evasion or host cell adaptation is rarely detected by population-based analysis [38,48,49]. Indeed, our results showing a distinct pattern of distribution of positively selected sites between N and C terminals of Vif protein mirrors the structural organization of this viral protein. For these reasons, positive selection detected in vif codons likely emerged as an adaptive response to optimize HIV-1 RNA recognition and neutralization of A3G/F in the population.
The comparison of amino acids of vif sequences revealed a limited variability in regions related with A3G/F activity, such as the regions 14 DRMR 17 and 40 YRHHY 44 , which are important for vif-induced degradation of A3G [40,50]. This conservation of vif motifs may indicate a significant evolutionary constraint that has been operating on this viral gene even among distinct lineages. Indeed, we found that most codons (60%) of vif gene are predominantly under purifying selection, and perhaps this pattern is needed to preserve its biological function during the viral life cycle. Likewise, HIV-1 nef gene is similarly under strong purifying selection [37,51]. Nevertheless, Nef is a multifunctional protein, and this feature can be observed by its plasticity, represented by extensive polymorphism and amino acid length variations that can be detected both in population samples and in the viral population within a single individual as well.
In addition, an attempt was made to establish the influence of the patients' statuses on the selective regimen of HIV-1. In a previous population-level studies, we observed that CD4+ T cell counts higher than 200 cells/μl were associated with increased dN/dS values in the env gene of HIV-1 subtype B [48,49,52]. For this reason, we measured the intensity of positive selection in the vif gene from datasets categorized into three distinct levels of CD4 counts (>200; 200-400 and <400). We found no difference in the mean dN/dS among these data sets (3.65, 3.85 and 3.09 respectively).
Perhaps our most remarkable finding was the identification of five pairs of codons (i.e., 78-154, 85-154, 101-157, 105-157, and 105-176 pairs) in the vif gene and their association with CD4+ cell levels lower than 500 cells per mm 3 . In each pair (epistatic codons) distinct amino acids combination were associated with distinct levels of CD4+ cells (see Figure 2 for a details). Notably, these pairs of codons were located mainly in the C-terminal of Vif protein (see Figure 1). It has been shown that the mutation 105 QLI 107 to 105 AAV 107 reduces the infectivity of HIV by 2% [53]. The amino acids between the 154th and 157th positions of the vif gene comprise the BC-box, the region that binds cellular elongin B and C to form complexes that trigger the ubiquitination and proteasomal degradation of the A3G proteins [44]. Since codons 154 and 157 are located in the alpha-helix of the BC-box, it is likely that certain amino acids in these sites may affect the interaction with the cellular elongin B and C complex and thereby affect the efficacy of Vif-induced A3G proteasomal degradation. In addition, the 161 PPLP 164 motif is fundamental to vif multimerization and interaction with cellular proteins [41,44,46]. Remarkably, proline-to-alanine substitutions in the 161 PPLP 164 motif have no effect to the vif structure although it decreased the ability this protein to form oligomers [41]. These findings suggest that domains in the C-terminal of Vif protein fold independently of each other and the flexibility of these domains is required to interact directly with distinct cellular counterparts [41,42]. Thus, we postulate that epistatic effect observed in pairs of codons, indicate electrostatic interaction of certain pairs of amino acids required to Vif activity.
The presence of co-evolving sites was further investigated using a Bayesian graph model that explores associations between codon sites and accounts for the phylogenetic sign of the sequences. The results indicated that amino acids at sites 80-83, 80-86 and 83-144 of vif co-evolve in phylogenies constructed with vif gene of the HIV-1. Interestingly, although both methods did not indicate the same sites, these results corroborate the identification of a region between sites 78 to 86 of HIV-1 vif gene that has many sites co-evolving with codons located within the BC-box.

Conclusion
The host-virus interaction between A3G and vif are likely to affect AIDS in many instances. Conversely, the adaptive evolution in the HIV-1 vif gene is mainly explained by a response optimized to neutralize A3G activity. Co-evolution detected in some codons suggests that regions of the Vif protein are highly constrained and may have important function to the virus activity. Here, we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.