Skip to main content
  • Research article
  • Open access
  • Published:

Computational MHC-I epitope predictor identifies 95% of experimentally mapped HIV-1 clade A and D epitopes in a Ugandan cohort



Identifying immunogens that induce HIV-1-specific immune responses is a lengthy process that can benefit from computational methods, which predict T-cell epitopes for various HLA types.


We tested the performance of the NetMHCpan4.0 computational neural network in re-identifying 93 T-cell epitopes that had been previously independently mapped using the whole proteome IFN-γ ELISPOT assays in 6 HLA class I typed Ugandan individuals infected with HIV-1 subtypes A1 and D. To provide a benchmark we compared the predictions for NetMHCpan4.0 to MHCflurry1.2.0 and NetCTL1.2.


NetMHCpan4.0 performed best correctly predicting 88 of the 93 experimentally mapped epitopes for a set length of 9-mer and matched HLA class I alleles. Receiver Operator Characteristic (ROC) analysis gave an area under the curve (AUC) of 0.928. Setting NetMHCpan4.0 to predict 11-14mer length did not improve the prediction (37–79 of 93 peptides) with an inverse correlation between the number of predictions and length set. Late time point peptides were significantly stronger binders than early peptides (Wilcoxon signed rank test: p = 0.0000005). MHCflurry1.2.0 similarly predicted all but 2 of the peptides that NetMHCpan4.0 predicted and NetCTL1.2 predicted only 14 of the 93 experimental peptides.


NetMHCpan4.0 class I epitope predictions covered 95% of the epitope responses identified in six HIV-1 infected individuals, and would have reduced the number of experimental confirmatory tests by > 80%. Algorithmic epitope prediction in conjunction with HLA allele frequency information can cost-effectively assist immunogen design through minimizing the experimental effort.

Peer Review reports


Computational algorithms are increasingly utilised in biological modelling and offer the potential to reduce the time and expense of immunological assays. Computational algorithms were initially demonstrated as useful tools for predicting potential epitopes that might elicit quality T-cell responses [1, 2]. Computational algorithms that predict potential HLA binding T-cell epitopes can facilitate the design of vaccines capable of inducing T-cell immunity against HIV-1. The high variability of HIV-1 and the extensive genetic polymorphism of HLA molecules can be managed in silico, allowing immunogen optimisation to increase breadth and magnitude of T cell responses in respect of HLA allele frequencies and circulating virus strains in different populations. Bioinformatics approaches were previously applied as proof of concept for an HIV-1 peptide-based vaccine for the env and gag genes [3] in cynomolgus macaques for a broad spectrum of HIV-1 clades. Computational optimisation of immunogens facilitates the development of the multivalent and mosaic vaccines [4] necessary to control recombinant HIV-1 strains, an increasingly common occurrence in the epidemic in Uganda [5]. Computational approaches aim to identify optimal epitopes relevant to vaccine development and are not isolated to HIV-1 only, but a wide range of pathogens, including Ebola virus [6], therefore various statistical validation approaches have been applied for evaluation of these methods [7,8,9,10].

For HIV-1 vaccine design purposes an important consideration for the suitability of a computational algorithm is the breadth of discrete number of T-cell epitopes it generates that could reach particular levels of coverage [11] of circulating viruses. The higher the number of epitope variants the more the reduction in their requirements to attain optimum coverage levels for any epidemic. Previous data has shown that breadth of T-cell response is associated to viral set point in chronic HIV-1 infection [12,13,14,15,16,17]. In order to translate the computational epitope prediction into vaccine design, the number of discrete epitopes computationally generated from particular HIV-1 proteins is an important metric for further investigation [11].

A reliable pan-HLA-specific algorithm NetMHCpan4.0 [18,19,20] that has been improved by advances in HLA binding data, covers 172 MHC class I molecules from human (HLA-A, B, C, E), mouse (H-2), cattle (BoLA), primates (Patr, Mamu, Gogo) and swine (SLA) [20, 21], and can also predict binding to alleles devoid of experimental data basing on similarity to known binders and non-binders [22, 23]. This is an artificial neural network (ANN) algorithm for predictions of 8-14aa and capable of predicting epitopes for other HLA alleles using data for similar alleles by positional similarity of residues in their binding motifs. NetMHCpan4.0 is considered to be the tool of choice for such predictions considering the benchmarking done against other related tools [24]. Nevertheless to have a conclusive outcome of the computational performance we compared NetMHCpan4.0 to both an older and recent tool, NetCTL1.2 [25,26,27] and MHCflurry1.2.0 [28] respectively. The binding of CTL epitopes to MHC class I molecules is linear, anchoring at residues 2 and 9; hence the interface between ligand and CTL can be determined computationally [29]. Validation of such computational applications can be done by comparing their predictions with suitable experimental data. Despite the paucity of data validating the performance of computational methods relative to wet laboratory experiments, a few have documented them to achieve an area under the curve AUC of over 90% [18, 19, 30, 31] by isolated experimental data. We have not come across a wet experiment that evaluated computational predictors to achieve a robust AUC using a single set of wet laboratory experimental data. The previously reported 90% AUC is largely based on positional specific scoring algorithms (PSSM) for the collective isolated experiments alongside probability models used to establish affinity or binding scores. One study that explored the reliability of in-silico approaches in epitope prediction and its application for vaccine design reported a meagre 22, 44%, and relatively higher 78% match for three computational tools namely YFPEITHI, CTLPRED and IEDB respectively [32]. Using experimental epitope mapping data generated from 757 peptides tested on cells of 6 early HIV-1 infected individuals at paired time points, we show that NetMHCpan4.0 can be useful for markedly reducing pooled peptide experiments as demonstrated by the 95% experimental and computational concordance.


Experimental binder data

The data used was from an independent study that did not include this analysis in its objectives. Experimental data of peptides previously mapped for HIV-1 epitope recognition of 6 individuals for a separate study (Table 1) at 2 time points each was used for comparison with the computationally predicted binders. These were from a Ugandan early HIV-1 serodiscordant couple cohort approved by the Uganda Virus Research Institute (UVRI), Research and Ethics review board and the Uganda National Council of Science and Technology (UNCST). All participants provided informed consent. Six (6) participants whose experimental epitope recognition profile we evaluated were early HIV-1 infections (Table 1), enrolled under the following criteria: (i) detection of HIV-1 P24 antigen with a simultaneous negative HIV-1 antibody ELISA (2 participants) or documented HIV-1 sero-negative test in the previous 12 months (4 participants); (ii) HAART naïve (all). Early infection was determined following the Fiebig Staging criteria [33] as described elsewhere by Obuku A.E. [34]..

Table 1 Participant characteristics, HIV-1 infecting clade, Fiebig stage and HLA class I haplotypes

The experimentally tested peptides totalled 757 (Fig. 1), were 17aa long, overlapping by 11aa and spanning the HIV-1 proteome consensus for subtypes A1 and D. Cultured ELISPOT assays using 200,000 cells/well as previously documented by Obuku AE. [34]. and ex-vivo IFN-γ ELISPOT assay using 100,000 cells/well were used for testing peptide pools and epitope mapping respectively. Experimental positive pools were 3 times the background wells and at least 600 spot forming units per million cells. “Deconvolute This” software [35] was used to identify possible responding individual peptides from the pools or where it was not possible all the peptides in a pool were tested as single peptides.

Fig. 1
figure 1

ELISPOT peptide consort; the experimental peptide mapping data was generated by culture ELISPOT of multiple peptide pools tested in duplicate wells per time point, followed by ex-vivo ELISPOT of potential candidate epitopes. To experimentally map a single time point required at least 541 assay wells

HLA typing

High resolution reference strand conformation analysis HLA class I tissue typing for the early infected subjects was done using methods described elsewhere [36].

HIV-1 subtyping

HIV-1 subtyping determination was performed on the gag gene [37, 38] using Sanger method generated sequences. The sequences were input into the REGA HIV-1 automated subtyping tool to determine the HIV-1 clade [39, 40].

Computation epitope prediction

HIV-1 subtypes A1 and D consensus sequences were used as inputs for the computational epitope prediction. These peptide sequences were all for the year 2004 downloaded from the Los Alamos database ( The web version of NetMHCpan4.0 [19] ( was configured to predict 9mer through 14mer epitopes for 22 HLA class I alleles (Table 1) that were expressed by the 6 HIV-1 infected donors. Linux version MHCflurry1.2.0 [28] was used to predict 9mer epitopes and an earlier tool NetCTL1.2 was also used to predict 9mer epitopes for the 22 HLA class I alleles expressed by the 6 study individuals. Perl version 5.26.2 was used to extract the binders from all the NetMHCpan4.0 predictions and also to compare the computational binders to the 93 mapped experimental 17aa peptides for 9mer through 14mer hits using a sliding window. An experimental peptide was considered a hit if any of the computational 9mer through 14mer sequence was contained in the 17 amino acid experimental peptide sequence as well as any of the HLA-A, B or C expressed by the individual matched the NetMHCpan4.0 HLA class I type(s). If multiple computational epitope predictions were contained in a single 17mer experimental peptide they were counted as a single hit. These were determined by a BLAST search of the computational binders against the derivative experimental peptides to determine computational predictions from the same test peptide. The accession numbers of the sequences used to determine the HIV-1 subtypes for 5 of the 6 study subjects are; KT825896, KT825897, KT825898, KT825899, KT825900, KT825901, KT825902, KT825903, KT825904, KT825905, KT825906, KT825907, KT825908, KT825909, KT825910, KT825911 and KT82512.

Data analysis

Statistics computations and plots were generated using SPSS version The NetMHCpan4.0 computational performance was evaluated using a confusion matrix to classify true positives, true negatives, false positives and false negatives that were used for the Receiver Operator Characteristic (ROC) plot. The hit rate (sensitivity) and false hit rate (specificity) of binder predictions as determined by the NetMHCpan4.0 threshold of peptides within the top 2% (with a score of 2 or less) were calculated and the strength of the model was determined by calculating the area under the curve, AUC of the ROC plot [41,42,43]. Pearson’s correlation coefficient was used to evaluate the relationship between the number of epitopes with various HIV-1 genes. To evaluate if there were any differences in the early versus late time point peptides for the binding ranking of the experimentally mapped peptides as predicted by the computational score the Wilcoxon signed rank test was used. To evaluate if HIV-1 subtypes A1 and D affected the number of computational predictions generated, Fisher Exact Test was used. To determine whether multiple computationally predicted epitope sequences were derived from the same experimental peptide sequence, a local blast database was set up using Geneious version 9.0.5. Both HIV-1 clades A1 and D experimental consensus sequences were used separately each as a reference sequence for the blast. The computational peptide sequences were then aligned against the consensuses to evaluate those derived from a single 17 amino acid experimental peptide sequence. Where an experimental peptide was predicted by multiple or overlapping computational peptides, the average NetMHCpan4.0 score was assigned as the computational score for this peptide. This score was also used during the generation of the ROC curve and the confusion matrix. To compare the association between ELISPOT spot forming units and NetMHCpan4.0 scores or MHCflurry1.2.0 affinities and also the association between the values for the 2 computational tools, Pearson’s correlation coefficient was used.


Number of experimental assays compared to computationally guided prediction assay projections

To experimentally determine epitopes for 757 peptides spanning the whole HIV-1 proteome for clades A and D as well as both time points of the 6 individuals required a total of 4230 test assay wells. For each test subject these included 9 antigen proliferation wells, 384 culture ELISPOT wells and an average of 164 epitope mapping ELISPOT wells (Range; 148–186 test wells). Using the 22 HLA alleles represented in the study subjects we were able to computationally predict 95% of the experimentally mapped epitopes. This approach could have reduced the test assays by eliminating all the T-cell antigen proliferation and culture ELISPOT steps totalling to 3258 assay wells (77%) and leaving only 972 (23%) epitope mapping assays required. Applying a pooling strategy to the computational predictions similar to that used in the experimental pooling where each pool contained approximately 20 peptides with a coverage of 3 per peptide pool, the 923 potential peptides (95% of experimental peptides for epitope mapping ELISPOT derived from the 972 (23%) eligible epitope mapping peptides) would make at most 46 pools. Consequently the computational prediction approach could have reduced the experimental assays by at least 80%.

Magnitude of epitope predictions are variable across HLA alleles, HIV-1 proteins and clades

The input HIV-1 subtypes A1 and D consensus whole proteome sequences evaluated for potential 9, 10, 11, 12, 13 and 14-mer binders to the 22 HLA alleles represented in the six patients, varied in the distribution of predicted binders across HIV-1 genes and HLA alleles. All the peptide hits predicted for 10 through 14-mer were also all predicted in the 9-mer set except for two 14-mer peptides. An expected positive correlation for HIV-1 protein length with number of epitopes predicted was observed as illustrated by Spearman’s rank order correlation; rs = 0.88 (Fig. 2, a and b). NetMHCpan4.0 predicted 95% (88/93) (Table 2) of the experimentally mapped peptides as binders and missed 5% (5 out of 93) (Table 3) for the 12-time points of the 6 participants. MHCflurry predicted 91% (85/93) of the experimental peptides and had a lot of similarity to NetMHCpan4.0 for the predicted HLA. NetCTL was the least performing tool with only 15% (14/93) predicted experimental peptides (Table 2).

Fig. 2
figure 2

NetMHCpan Binder Predictions. a Using our experimental peptide sequences as inputs into NetMHCpan4.0 to predict epitopes for 22 HLA types represented in the 6 HIV-1 Infected people, a heatmap showing absolute counts of computationally predicted 9-mer binders against HIV-1 genes was constructed. The dendrogram shows the nearest similarity for the number of predicted counts across HLA types; b the length of the HIV-1 protein sequence plotted against the absolute number of NetMHCpan4.0 predicted 9mer binders showing a positive correlation (Spearman’s correlation coefficient, rs = 0.88). The number of distinct predictions is dependent on the length of the HIV-1 sequence; c comparison of HIV-1 clade A and D absolute number of NetMHCpan4.0 predicted 9mer binders per HIV-1 gene for the wet experiment test peptide sequences. The algorithm predicted more binders for clade D than clade A

Table 2 Experimentally Mapped Peptides and Computationally Predicted Epitopes
Table 3 Peptides not predicted

Comparison of the various epitope prediction length set showed that the 9mer setting was ideal for NetMHCpan4.0. The number of predictions were 88, 79, 55, 39, 39 and 37 hits out of 93 for 9, 10, 11, 12, 13 and 14-mer epitopes respectively. Increasing the prediction length from 9mer through 14mer resulted in a smaller number of predicted binders as illustrated in Fig. 3. Since we held the assumption that our wet experimental data was the gold standard we evaluated the sensitivity and specificity of NetMHCpan4.0.The computational predictor had more predicted binders than those determined by the experimental mapping as presented in the confusion matrix in Table 4. The experimental positive’s count also shown in Table 2 under column “Hit No” shows the test peptide count (1through 88) that contained the computational 9-mer sequence. Multiple computational epitopes may be contained in a single experimental peptide, as shown in the column “NetMHCpan4.0 9-mer Epitope Prediction” in Table 2. Overall HIV-1 Clade A 9-mer predictions were fewer in number than clade D (Fig. 2, c) though the difference did not approach statistical significance.

Fig. 3
figure 3

Computational epitope prediction. NetMHCpan4.0 set length plotted against the number of predicted binders per HLA type shows that the number of predictions reduces as the input set length increases. The dotted line is the trend line, whereas the solid line is the line of best fit. The core 9mer epitope sequence was similar across 9mer through 14mer set length except for one 14-mer peptide (hit 72 in Table 2)

Table 4 Experimental and computational 9mer peptide confusion matrix

Comparison of experimentally mapped epitopes with in-silico prediction

The experimental peptide mapping data was derived from a baseline time point corresponding to HIV-1 Fiebig stages IV, V and VI (Table 1) and a later time point. Ninety-three (n = 93) epitopes were experimentally mapped of which 12 were recognized at both baseline and later time points, 34 only at baseline and 54 only at the later time point. Comparison of the ranked computational score for Netmhcpan4.0 binders of early (n = 34) versus later peptides showed that the later time point predictions were stronger binders reaching statistical significance (Wilcoxon signed rank p-value = 0.0000005) (Fig. 4). NetMHCpan4.0 ranked binders as those predicted to be in the top 2% and assigned a score of 0.2 or below. Any binder within the top 0.5% and assigned a score of 0.05 or below was ranked as a strong binder. Considering only the 9-mer computational predictions, peptides that were derived from the same 17-mer experimental peptide were determined by a BLAST mapping to their derivative sequences. The 17-mer peptides were then classified into a confusion matrix (Table 4) as true positives, false positives, true negatives or false negatives. From the classification the true positive rate (sensitivity) was plotted against the false positive rate (1-specificity) using an ROC curve and the AUC attained reached 0.928 (Fig. 5). Only 9-mer length epitopes were considered in the ROC analysis as increasing the length to 10-mer through 14mer NetMHCpan4.0 predictions neither raised the number of predicted binders nor improved the hit rate as all their predictions contained the sequence already predicted in the 9-mer set except 1, 14-mer peptide (hit 72 in Table 2). Comparison of the ELISPOT magnitude of response (spot forming units) did not show any association to either NetMHCpan4.0 scores or MHCflurry1.2.0 affinity values. Similarly a comparison of the latter 2 computational predictors did not show any association between their assigned “affinity” values. NetMHCpan4.0 registered the highest concordance to the wet experiments followed by MHCflurry1.2.0.

Fig. 4
figure 4

Early versus Late Peptides. Experimentally mapped peptides at baseline (n = 34) and at least 12 months later (n = 34) were compared using the 9-mer computational NetMHCpan4.0 scores of the hits. The lower the computational score the stronger the predicted binding. Late peptides were significantly stronger binders than early peptides (Wilcoxon signed rank test, p = 0.0000005)

Fig. 5
figure 5

ROC plot. False versus true positive rate for all 9-mer and a single 14-mer test peptides across the 22 test HLA class I types. The diagonal line shows the random guess whereas the red curve shows the observed experimentally mapped epitopes versus the NetMHCpan4.0 expected predictions


In this analysis we showed that the computational method NetMHCpan4.0 predicted 95% of previously experimentally mapped HIV-1 epitopes in 6 HIV-1 infected individuals expressing a total of 22 different HLA class I alleles. In our IFN-γ ELISPOT assays we evaluated 757 17mer peptides overlapping by 11 amino acids and covering the whole HIV-1 subtype A1 and D consensus proteomes. Out of the 5 experimentally determined epitopes missed by the algorithm (Table 3), 4 were actually computationally predicted as binders but were not included for lack of concordance with the participant’s HLA alleles. About one third (37) of 125 total positive predictions were not experimentally supported in our tests. These do not necessarily represent false positives, as ELISPOT detection depends on the frequency of specific T cells in the participant’s repertoire, and we observed changes in dominant T cell specificities within a given participant between early and later time points after HIV-1 infection. A formal ROC evaluation of the score generated by NetMHCpan4.0 as a classifier for peptides recognised/not recognised by PBMC in IFN-g ELISPOT assays, produced an AUC of 0.928. Thus experimental confirmatory tests cannot be dropped altogether, however the NetMHCpan4.0 algorithm could provide a considerable saving of time and resources in verifying just the predicted epitopes.

As the participants had been enrolled in the acute/early phase of HIV-1 infection and we had observed intra-participant changes in epitope recognition between early and late time points after infection, we compared the binding scores of confirmed epitopes at these time points and found a statistically significant change towards recognition of higher binding peptides as the infection entered the chronic phase. This might represent better support of the T-cell response directed at more stable HLA/peptide complexes as the infection progresses into chronicity.

The NetMHCpan4.0 algorithm, which is based on binding affinity and integrates data on eluted naturally processed ligands, reflected optimal HLA class I binding for 9-mers, producing a decreasing number of predictions when the peptide size was increased from 9 to 11 amino acids. With a single exception, predicted binders between 11 and 14 amino acids included at least one 9mer predicted to bind on its own, suggesting a destabilizing effect of the extra amino acids beyond the canonical HLA class I binding pockets at positions 2 and 9 could account for fewer predictions.

Important limitations are the lack of predictions of HLA class II restricted epitopes, which might have contributed to a fraction of IFN-γ ELISPOT responses. Approximately 5% of the computational predictions may be false positives that only increase the size of planned wet experiments and approximately 1% of true positives may also be missed.


In this analysis, using NetMHCpan4.0, MHCflurry and NetCTL to predict previously experimentally mapped epitopes, we demonstrate that the computational methods reliably predict an acceptable portion of binder epitopes. We recommend the use of such computational methods to reduce the size of experiments required cost associated.

Availability of data and materials

Most of the relevant data to support the manuscript has been included in the write-up. If any addition data is required will be availed once requested.



Area under the curve


Cytotoxic T lymphocytes


Human immunodeficiency virus type 1


Human leucocyte antigen


Receiver operator characteristic


  1. Tambunan US, Sipahutar FR, Parikesit AA, Kerami D. Vaccine design for H5N1 based on B- and T-cell epitope predictions. Bioinformat Biol Insights. 2016;10:27–35.

    Article  CAS  Google Scholar 

  2. Yu K, Petrovsky N, Schonbach C, Koh JY, Brusic V. Methods for prediction of peptide binding to MHC molecules: a comparative study. Mol Med. 2002;8(3):137–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Azizi A, Anderson DE, Torres JV, Ogrel A, Ghorbani M, Soare C, Sandstrom P, Fournier J, Diaz-Mitoma F. Induction of broad cross-subtype-specific HIV-1 immune responses by a novel multivalent HIV-1 peptide vaccine in cynomolgus macaques. J Immunol. 2008;180(4):2174–86.

    Article  CAS  PubMed  Google Scholar 

  4. Baden LR, Walsh SR, Seaman MS, Cohen YZ, Johnson JA, Licona JH, Filter RD, Kleinjan JA, Gothing JA, Jennings J, et al. First-in-human randomized controlled trial of mosaic HIV-1 Immunogens delivered via a modified Vaccinia Ankara vector. J Infect Dis. 2018;218(4):633–44.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Yebra G, Ragonnet-Cronin M, Ssemwanga D, Parry CM, Logue CH, Cane PA, Kaleebu P, Brown AJ. Analysis of the history and spread of HIV-1 in Uganda using phylodynamics. J Gen Virol. 2015;96(Pt 7):1890–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lim WC, Khan AM. Mapping HLA-A2, −A3 and -B7 supertype-restricted T-cell epitopes in the ebolavirus proteome. BMC Genomics. 2018;19(Suppl 1):42.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Goodswen SJ, Kennedy PJ, Ellis JT. Enhancing in silico protein-based vaccine discovery for eukaryotic pathogens using predicted peptide-MHC binding and peptide conservation scores. PLoS One. 2014;9(12):e115745.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Goodswen SJ, Kennedy PJ, Ellis JT. Discovering a vaccine against neosporosis using computers: is it feasible? Trends Parasitol. 2014;30(8):401–11.

    Article  PubMed  Google Scholar 

  9. Goodswen SJ, Kennedy PJ, Ellis JT. Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology. Bioinformatics. 2014;30(16):2381–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Pedersen LE, Rasmussen M, Harndahl M, Nielsen M, Buus S, Jungersen G. A combined prediction strategy increases identification of peptides bound with high affinity and stability to porcine MHC class I molecules SLA-1*04:01, SLA-2*04:01, and SLA-3*04:01. Immunogenetics. 2016;68(2):157–65.

    Article  CAS  PubMed  Google Scholar 

  11. Currier JR, Robb ML, Michael NL, Marovich MA. Defining epitope coverage requirements for T cell-based HIV vaccines: theoretical considerations and practical applications. J Transl Med. 2011;9:212.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Frahm N, Kiepiela P, Adams S, Linde CH, Hewitt HS, Sango K, Feeney ME, Addo MM, Lichterfeld M, Lahaie MP, et al. Control of human immunodeficiency virus replication by cytotoxic T lymphocytes targeting subdominant epitopes. Nat Immunol. 2006;7(2):173–8.

    Article  CAS  PubMed  Google Scholar 

  13. Frahm N, Korber BT, Adams CM, Szinger JJ, Draenert R, Addo MM, Feeney ME, Yusim K, Sango K, Brown NV, et al. Consistent cytotoxic-T-lymphocyte targeting of immunodominant regions in human immunodeficiency virus across multiple ethnicities. J Virol. 2004;78(5):2187–200.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Geldmacher C, Currier JR, Herrmann E, Haule A, Kuta E, McCutchan F, Njovu L, Geis S, Hoffmann O, Maboko L, et al. CD8 T-cell recognition of multiple epitopes within specific gag regions is associated with maintenance of a low steady-state viremia in human immunodeficiency virus type 1-seropositive patients. J Virol. 2007;81(5):2440–8.

    Article  CAS  PubMed  Google Scholar 

  15. Goonetilleke N, Moore S, Dally L, Winstone N, Cebere I, Mahmoud A, Pinheiro S, Gillespie G, Brown D, Loach V, et al. Induction of multifunctional human immunodeficiency virus type 1 (HIV-1)-specific T cells capable of proliferation in healthy subjects by using a prime-boost regimen of DNA- and modified vaccinia virus Ankara-vectored vaccines expressing HIV-1 gag coupled to CD8+ T-cell epitopes. J Virol. 2006;80(10):4717–28.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Liu Y, McNevin JP, Holte S, McElrath MJ, Mullins JI. Dynamics of viral evolution and CTL responses in HIV-1 infection. PLoS One. 2011;6(1):e15639.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Rolland M, Heckerman D, Deng W, Rousseau CM, Coovadia H, Bishop K, Goulder PJ, Walker BD, Brander C, Mullins JI. Broad and gag-biased HIV-1 epitope repertoires are associated with lower viral loads. PLoS One. 2008;3(1):e1424.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Hoof I, Peters B, Sidney J, Pedersen LE, Sette A, Lund O, Buus S, Nielsen M. NetMHCpan, a method for MHC class I binding prediction beyond humans. Immunogenetics. 2009;61(1):1–13.

    Article  CAS  PubMed  Google Scholar 

  19. Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J Immunol. 2017;199(9):3360–8.

    Article  CAS  PubMed  Google Scholar 

  20. Zhang H, Lundegaard C, Nielsen M. Pan-specific MHC class I predictors: a benchmark of HLA class I pan-specific prediction methods. Bioinformatics. 2009;25(1):83–9.

    Article  PubMed  CAS  Google Scholar 

  21. Peters B, Bui HH, Frankild S, Nielson M, Lundegaard C, Kostem E, Basch D, Lamberth K, Harndahl M, Fleri W, et al. A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol. 2006;2(6):e65.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Jacob L, Vert JP. Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics. 2008;24(3):358–66.

    Article  CAS  PubMed  Google Scholar 

  23. Zhang GL, Khan AM, Srinivasan KN, August JT, Brusic V. MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic acids research. 2005;33(Web Server issue):W172–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Trolle T, Metushi IG, Greenbaum JA, Kim Y, Sidney J, Lund O, Sette A, Peters B, Nielsen M. Automated benchmarking of peptide-MHC class I binding predictions. Bioinformatics. 2015;31(13):2174–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Larsen MV, Lundegaard C, Lamberth K, Buus S, Brunak S, Lund O, Nielsen M. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. Eur J Immunol. 2005;35(8):2295–303.

    Article  CAS  PubMed  Google Scholar 

  26. Larsen MV, Lundegaard C, Lamberth K, Buus S, Lund O, Nielsen M. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC bioinformatics. 2007;8:424.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, Brunak S, Lund O. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 2003;12(5):1007–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. O'Donnell TJ, Rubinsteyn A, Bonsack M, Riemer AB, Laserson U, Hammerbacher J. MHCflurry: open-source class I MHC binding affinity prediction. Cell Syst. 2018;7(1):129–32 e124.

    Article  CAS  PubMed  Google Scholar 

  29. DeLisi C, Berzofsky JA. T-cell antigenic sites tend to be amphipathic structures. Proc Natl Acad Sci U S A. 1985;82(20):7048–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. He Y, Rappuoli R, De Groot AS, Chen RT. Emerging vaccine informatics. J Biomed Biotechnol. 2010;2010:218590.

    PubMed  Google Scholar 

  31. Reche PA, Reinherz EL. Prediction of peptide-MHC binding using profiles. Methods Mol Biol. 2007;409:185–200.

    Article  CAS  PubMed  Google Scholar 

  32. Roider J, Meissner T, Kraut F, Vollbrecht T, Stirner R, Bogner JR, Draenert R. Comparison of experimental fine-mapping to in silico prediction results of HIV-1 epitopes reveals ongoing need for mapping experiments. Immunology. 2014;143(2):193–201.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, Peddada L, Heldebrant C, Smith R, Conrad A, Kleinman SH, et al. Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. Aids. 2003;17(13):1871–9.

    Article  PubMed  Google Scholar 

  34. Obuku AE, Bugembe DL, Musinguzi K, Watera C, Serwanga J, Ndembi N, Levin J, Kaleebu P, Pala P. Macrophage inflammatory Protein-1 Beta and Interferon gamma responses in Ugandans with HIV-1 acute/early infections. AIDS Res Hum Retrovir. 2016;32(3):237–46.

    Article  CAS  PubMed  Google Scholar 

  35. Roederer M, Koup RA. Optimized determination of T cell epitope responses. J Immunol Methods. 2003;274(1–2):221–8.

    Article  CAS  PubMed  Google Scholar 

  36. Arguello JR, Little AM, Bohan E, Goldman JM, Marsh SG, Madrigal JA. High resolution HLA class I typing by reference strand mediated conformation analysis (RSCA). Tissue Antigens. 1998;52(1):57–66.

    Article  CAS  PubMed  Google Scholar 

  37. Serwanga J, Mugaba S, Pimego E, Nanteza B, Lyagoba F, Nakubulwa S, Heath L, Nsubuga RN, Ndembi N, Gotch F, et al. Profile of T cell recognition of HIV type 1 consensus group M gag and Nef peptides in a clade A1- and D-infected Ugandan population. AIDS Res Hum Retrovir. 2012;28(4):384–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Serwanga J, Nakiboneka R, Mugaba S, Magambo B, Ndembi N, Gotch F, Kaleebu P. Frequencies of gag-restricted T-cell escape "footprints" differ across HIV-1 clades A1 and D chronically infected Ugandans irrespective of host HLA B alleles. Vaccine. 2015;33(14):1664–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Alcantara LC, Cassol S, Libin P, Deforche K, Pybus OG, Van Ranst M, Galvao-Castro B, Vandamme AM, de Oliveira T. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences. Nucleic Acids Res. 2009;37(Web Server issue):W634–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. de Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, Seebregts C, Snoeck J, van Rensburg EJ, Wensing AM, van de Vijver DA, et al. An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics. 2005;21(19):3797–800.

    Article  PubMed  CAS  Google Scholar 

  41. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.

    Article  CAS  PubMed  Google Scholar 

  42. Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol. 2004;5(1):11–8.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Hanczar B, Hua J, Sima C, Weinstein J, Bittner M, Dougherty ER. Small-sample precision of ROC-related estimates. Bioinformatics. 2010;26(6):822–30.

    Article  CAS  PubMed  Google Scholar 

Download references


We thank the study participants who provided specimen for the wet laboratory experiments, the AIDS Information Centre Clinic Kampala, Uganda that steered the Rubicon discordant couple cohort study, the late Anthony Kebba who initiated the Rubicon Cohort and Christine Watera who coordinated the recruitment clinic activities.

We thank Ruhena Sargeant for HLA typing, the late Harr F. Njai for HIV-1 ELISA and Western blot assays and Deogratius Ssemwanga for help with GenBank submissions.


This research is jointly funded by the UK Medical Research Council (MRC) and the UK Department for International Development (DFID) under the MRC/DFID Concordat agreement, the Wellcome Trust (grant WT078927MA), and EDCTP (project code: TA_05_40200_40203).

Author information

Authors and Affiliations



DBL performed the peptide mapping experiments. NN provided HIV-1 subtyping. DBL analysed the data together with PP. DBL wrote the manuscript with major contributions from PP, AEO, PK and JS. All authors reviewed the manuscript and or provided useful contributions as well as approved the final manuscript.

Corresponding author

Correspondence to Daniel Lule Bugembe.

Ethics declarations

Ethics approval and consent to participate

The Rubicon study, from which we derived the experimental data, was reviewed and approved by the Uganda Virus Research Institute, Research and Ethics Committee (UVRI-REC) and the Uganda National Council of Science and Technology (UNCT). All participants provided informed signed consent accepting to freely participate in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bugembe, D.L., Ekii, A.O., Ndembi, N. et al. Computational MHC-I epitope predictor identifies 95% of experimentally mapped HIV-1 clade A and D epitopes in a Ugandan cohort. BMC Infect Dis 20, 172 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: