Comparing human papillomavirus prevalences in women with normal cytology or invasive cervical cancer to rank genotypes according to their oncogenic potential: a meta-analysis of observational studies

Background Mucosal human papillomavirus (HPV) infection is a necessary cause of cervical cancer. Vaccine and non-vaccine genotype prevalences may change after vaccine introduction. Therefore, it appears essential to rank HPV genotypes according to their oncogenic potential for invasive cervical cancer, independently of their respective prevalences. Methods We performed meta-analyses of published observational studies and estimated pooled odds ratios with random-effects models for 32 HPV genotypes, using HPV-16 as the reference. Results Twenty-seven studies yielded 9,252 HPV-infected women: 2,902 diagnosed with invasive cervical cancer and 6,350 with normal cytology. Expressed as (odds ratio [95% confidence interval]), HPV-18 (0.63 [0.51, 0.78]) ranked closest to HPV-16, while other genotypes showed continuously decreasing relative oncogenic potentials: HPV-45 (0.35 [0.22, 0.55]), HPV-69 (0.28 [0.09, 0.92]), HPV-58 (0.24 [0.15, 0.38]), HPV-31 (0.22 [0.14, 0.35]), HPV-33 (0.22 [0.12, 0.38]), HPV-34 (0.21 [0.06, 0.80]), HPV-67 (0.21 [0.06, 0.67]), HPV-39 (0.17 [0.09, 0.30]), HPV-59 (0.17 [0.09, 0.31]), HPV-73 (0.16 [0.06, 0.41]), and HPV-52 (0.16 [0.11, 0.23]). Conclusions Our results support the markedly higher oncogenic potentials of HPV-16 and -18, followed by HPV-31, -33, -39, -45, -52, -58 and -59, and highlight the need for further investigation of HPV-34, -67, -69 and -73. Overall, these findings could have important implications for the prevention of cervical cancer.


Background
Invasive cervical cancer (ICC) is the third most common cancer among women worldwide, with an estimated incidence of 553,119 new cases and 288,109 deaths in 2010 [1]. Persistent infection with one of the oncogenic human papillomavirus (HPV) genotypes is required to cause ICC [2][3][4][5]. More than 150 HPV genotypes have been identified and about 40 are known to infect the genital tract [6,7].
Two vaccines targeting HPV-16 and -18, which account for 70% of cervical cancers worldwide [12,13], are currently available. Vaccination impact on the cervical cancer incidence remains uncertain, especially because genotypespecific prevalences of vaccine and non-vaccine genotypes might change after vaccine introduction through vaccine-induced cross-protection or genotype replacement [14][15][16]. The number of ICC cases associated with a given HPV genotype depends both on its prevalence in the general population and its oncogenic potential, which can be defined as the inherent and differential abilities of each genotype to trigger malignant transformation and induce cervical cancer [17]. Within categories of IARC-classified carcinogenic HPV genotypes, the risk of progression to ICC might differ by HPV genotype. Therefore, ranking the oncogenic potentials of HPV genotypes, independently of their respective prevalences, is challenging but essential to guide the formulation of second-generation polyvalent HPV vaccines and HPV-DNA-based screening tests.
This study was undertaken to rank HPV genotypes as causal agents of ICC according to their relative oncogenic potentials assessed by means of meta-analyses of published observational data. Oncogenic potentials herein are expressed using HPV-16 as the reference, since it has been recognized as the most carcinogenic HPV genotype [8,11,18].
First, article titles and abstracts were screened then full texts were read to check inclusion criteria. The relevance of references cited in the retrieved articles, reviews and meta-analyses was also evaluated for potential inclusions. When necessary, authors were contacted for confirmation of inclusion criteria or results.
Unvaccinated women of any age were considered for this meta-analysis. We defined the following three inclusion criteria: prevalence data for at least one HPV genotype other than HPV-16 and -18; inclusion of ≥20 HPV-infected women with ICC and ≥20 HPV-infected women with normal cytology, and HPV-prevalence data for women with ICC presented separately from those with normal cytology.
One author (EB) conducted the eligibility assessment and problematic papers were resolved by collegial discussion (MPS, ACMT). BibDesk 1.5.4 software was used to manage references.

Data extraction
For each study, one author (EB) extracted the following data, entering them into a predefined Excel spreadsheet: study characteristics (first author, year of publication, journal, country and continent, design and funding source), characteristics of included subjects (number of cases [total and by histologic type, if available] and controls, age data), methods (cytologic or histologic cervical specimen, identification and typing method, primers used [if any] and number of HPV genotypes detectable) and results (numbers of HPV genotypes actually identified, and simple and multiple infections, genotype-specific HPV-prevalence data for cases and controls). For multiple infections with ≥2 HPV genotypes, no weighting was used and each HPV genotype was counted equally. Infection with an uncharacterized HPV genotype is denoted HPV-X.
Furthermore, study quality was assessed with a list of specifically defined criteria, inspired by some of the STROBE checklist items [19]. The following items were coded yes, no, unclear or missing: comparability of cases and controls for geographic origin, age, sample type and methods used to detect and genotype HPV; blinded assessment; numbers of individuals reported at each study stage; and genotype prevalences reported for multiple infections. Data extraction was double checked by two authors (MPS, ACMT).

Statistical analyses
A meta-analysis was performed for each HPV genotype, after excluding studies that did not seek or report the genotype under consideration and those that sought it but reported zero cases and controls. Therefore, the number of studies analyzed varied from one genotype to another. We did not consider genotypes for which all but one study reported zero controls. For each study and each available genotype, an odds ratio (OR) and its 95% confidence interval (CI) were computed from the reported numbers of case and control infections, considering HPV-16 infections as the reference group. Then, each HPV genotype was subjected to meta-analysis across the corresponding number of studies by combining the studies' ORs using DerSimonian and Laird'sr a n d o meffects model [20]. If a count equalled zero when crosstabulating case-control status and infection with a given HPV genotype or HPV-16 (usually no controls infected with the HPV genotype under consideration), we applied a continuity correction (CC) by adding 0.5 to each cell [21,22].
For each HPV genotype, heterogeneity of the estimated oncogenic potentials relative to HPV-16 was assessed graphically with forest plots, and quantitatively using Cochran's Q-test and the I 2 statistic [23,24]. When Cochran's Q-test was statistically significant at the 10% level or the I 2 statistic ≥50%, we examined potential sources of heterogeneity by performing subgroup analyses according to five prespecified factors: study design (case-control versus cross-sectional), year of publication (before and after the median, ≤2005 versus >2005), comparability of case and control ages (similar versus unbalanced distribution or unclear information), geographic area (Asia [18,25] versus all other continents), and HPV-detection level among cases (<90% versus ≥90% threshold). For each HPV genotype, we assessed publication bias (or other potential sources of bias) by examining the funnel plots for asymmetry and running the Egger test [26].
Sensitivity analyses were performed using: a fixed-effect model, with Peto's method [27], CC = 0.25 or 0, and HPVnegative subjects as the reference group.
All statistical analyses were computed using the package Meta-analysis in Stata in Stata/SE v11.0 [28,29].

Study identification and description
The Medline and Embase database searches provided, respectively, 757 and 182 references, while additional searches identified 11 studies, yielding, after deleting 55 duplicates, a total of 895 references ( Figure 1). Among them, 794 were excluded based on their titles and abstracts. The full texts of the remaining 101 references were read and 27 studies fully satisfying the inclusion criteria were finally retained : 17 case-control and 10 cross-sectional studies, published between 1997 and 2011, all but one (Spanish [30]) written in English. They had been conducted on four continents: Asia (12 studies), Europe (six studies), South and Central America (five studies), and Africa (four studies). A total of 3,191 women with ICC (cases) and 29,623 with normal cytology (controls) were included (Table 1).
When available, mean age ranges were 44-56 years for cases and 32-52 years for controls. Case and control ages were comparable in five studies but not in eight others, and this information was unclear or missing in the 14 other papers. The cervical specimens used to detect HPV infection were usually cytologic for controls (23 studies) and histologic for cases (13 studies), with specimen type being similar for cases and controls in 14 studies. All studies used polymerase chain reaction (PCR) to detect HPV infection and 4-48 HPV genotypes could be identified in each study (4-20 in cases, 4-36 in controls). HPV infection was detected in 73-100% of cases (squamous cell carcinoma: 80-100%; adenocarcinoma: 50-100%) and 5-76% of controls (except in [33] which included only 24 controls, all HPV-positive).
Available data enabled assessment of the relative oncogenic potentials of 32 HPV genotypes (Table 2). Each meta-analysis included two (HPV-74) to 27 (HPV-18) studies (forest plots in Additional file 2). All pooled ORs were statistically significantly <1, except for HPV-74 (two-sided P = 0.20, calculated from two studies, one of which provided only one case, Figure 2

Heterogeneity and bias assessment
Cochran's Q-test suggested heterogeneity for six HPV genotypes: HPV-31, -33, -45, -51, -58 and -74; and the I 2 statistic for four among them: HPV-31, -33, -58 and -74 ( Table 2). This heterogeneity was not clearly explained by any of the factors considered (study design, year of publication, geographic area, age-distribution balance between cases and controls, or HPV-detection rate among cases). For example, the I 2 statistic was smaller for cross-sectional studies than case-control studies for HPV-33, -51 and -58, but higher for HPV-31 and -45.
For HPV-74, subgroup analyses could not be completed because too few studies were included. Moreover, no evidence of publication bias was found. No obvious asymmetry of the funnel plots was observed, except for HPV-18, -31 and -35, with slightly more small studies reporting higher ORs (Additional file 3). The Egger test was borderline or statistically significant only for HPV-6 (P = 0.091), HPV-35 (P = 0.040) and HPV-62 (P = 0.065).

Sensitivity analyses
When Peto's method for fixed-effect models was applied rather than DerSimonian and Laird's random-effects models, the first six HPV genotypes ranked in the exact same order according to their   Table 2, columns 5 and 6.
Notably, our results provide insights into the oncogenic potentials of several genotypes currently IARC-classified as probably oncogenic in humans. Our meta-analytic assessment of the oncogenic potentials of HPV-69 and -82 (both α-5 species), -30 (α-6), -67 (α-9), and -34 and -73 (α-11) was based on small numbers of cases, which yielded particularly wide CIs. However, they ranked among carcinogenic HPV genotypes, which could suggest stronger oncogenic potentials than assumed so far. To date, evidence for HPV-30, -34 and -69 has relied on their phylogenetic analogy to other HPV genotypes, while HPV-67, -73 and -82 were positively associated with cancer but lack strong mechanistic evidence [8,11]. In contrast, HPV-53, -66 and -70, also placed in the probably carcinogenic subgroup [8,11], had lower relative ORs in our analyses. Hence, overall, our analysis of available epidemiologic data provided more discrepant results for the probably carcinogenic genotype distribution.
Conversely, little to no mechanistic evidence supports that HPV-6 and -11 (both α-10 species), which commonly cause benign genital warts, can contribute to carcinogenesis and they remain unclassifiable as to their carcinogenicity in humans [8,11]. Our meta-analyses consistently ranked both at the end of the distribution with estimated pooled ORs ≤0. 15. We should mention that our HPV-16 reference model did not allow us to disentangle less oncogenic from non-oncogenic genotypes.
Finally, no epidemiologic evidence suggests cervical oncogenicity for HPV-40 and -44 [11]. In phylogenetic terms, these genotypes belong rather to non-oncogenic species (α-8 and -10, respectively) [58] and have been considered "low-risk" genotypes [10]. In our main analysis, these two genotypes ranked before HPV-6 and -11. However, their estimated ORs were based on limited data and their classification was not robust in the sensitivity analyses (Additional file 4). Taken together, our results do not support HPV-40 and -44 oncogenic potentials.
Strengths of our study derive from methodologic choices. To date, the assessment of the HPV-genotypespecific oncogenic potential in cervical cancer has mainly been based on HPV-genotype-prevalence data among cases [13,57,59,60]. However, that knowledge alone may be insufficient to fully appreciate each genotype'so n c ogenic potential. For a given HPV genotype, low frequency in ICC (corresponding to a small etiologic fraction) could reflect low prevalence in the general population or low oncogenic potential. In our study, HPV-genotype ranking according to their prevalences in cases visibly differed from that according to their estimated relative oncogenic potentials. For example, HPV-39 and -59 (both α-7), about four times less prevalent than HPV-52 (α-9), had higher oncogenic potentials estimated by their pooled ORs (Table 2); yet all three genotypes are IARC-classified as carcinogenic.
To our knowledge, the risks associated with the different HPV genotypes have rarely been assessed and HPVnegative, rather than HPV-16-positive, subjects served as the reference group to calculate ORs [4,10] with at least one exception [61]. Our similar third sensitivity analysis found lower OR estimates of the same order of magnitude as those previously published [10], e.g., respectively, 136.7 versus 281.9 for HPV-16 and 99.1 versus 222.5 for HPV-18. Notably, that third analysis showed no clear break point between HPV-18 and the other genotypes, with ORs decreasing progressively from HPV-16 to the end, unlike our main analysis. The choice of this reference group may be questioned because it takes uninfected cases into account for OR calculation, even though it is currently accepted that persistent HPV infection is required to cause ICC [2][3][4]. With few or no HPV-negative cases of cervical cancer expected, estimation of ORs and their CIs may become problematic. Therefore, we chose the unusual approach of using HPV-16-infected subjects as the reference category. HPV-16's high oncogenic potential is welldocumented [8,11], this genotype is highly prevalent in cases [13,57] and often identified in women with normal cytology [62,63]. In our opinion, considering HPV-16infected women as the reference group seemed more consistent with the natural history of cervical cancer and could be more appropriate for estimating HPV-genotype oncogenic potentials, regardless of their prevalence. However, the control group's baseline risk of developing ICC cannot be considered low, meaning that ORs cannot be directly interpreted as an accurate estimate of the relative risk, even though they can be used to rank genotypes. Alternatively, estimating ORs relative to an established low-risk genotype, e.g., HPV-6, was limited by the small, if not inexistent, numbers of ICC cases positive for such a genotype.
Herein, we combined study ORs using the randomeffects model, as sometimes recommended to perform meta-analyses of published data [64]. This approach implies wider CIs than in a fixed-effect model because, in addition to random fluctuations, the random-effects model allows for variability of the real risk. However, sensitivity analyses showed our results to be consistent with those obtained using Peto's method (Additional file 4), thereby indicating that the wide CIs mostly reflected the scarcity of epidemiologic data, rather than the choice of statistical models.
Some authors questioned the use of CC in the randomeffects model, when the underlying risk varies among studies [65]. Our sensitivity analyses with a halved CC factor differed only slightly from our main results. In contrast, applying no CC raised estimation difficulties preventing the calculation of two pooled ORs (Additional file 4). Nevertheless, our choice is supported by the consistencies, both external (with the literature) and internal (across other sensitivity analyses), of our findings after correction.
Our meta-analysis has several limitations that warrant being mentioned. First, we applied stringent selection criteria, including only studies with sufficient numbers of HPV-positive cases and controls. That choice rendered the several large investigations conducted in North America ineligible [66,67], which is consistent with 85% of ICC cases occurring in developing countries [1], and HPV-vaccine trials being conducted more frequently in Asia-Pacific, Europe or Latin America than North America [68,69]. Nevertheless, although the distributions of HPV genotypes vary across populations [18,57,59,63], no evidence indicates that HPV-type-specific oncogenic potential could differ according to geographic area. Moreover, the continent did not explain heterogeneity in our meta-analyses.
Second, basing this study on summary data meant we could not control for age, despite its being a critical variable, closely associated with HPV infection, clearance, persistence and progression. Age information was frequently missing and rarely available for HPV-positive cases and controls specifically. Controls tended to be 10 years younger than cases on average, possibly reflecting different stages in the natural history of cervical cancer. The peak prevalence of cervical HPV infection coincides closely with first-time sexual intercourse, at around 20 years of age, while that of ICC occurs at 40-50 years [70]. It was reassuring that the comparability of age distributions between cases and controls did not clearly explain heterogeneity in our meta-analyses.
Third, the small number of cases infected with some HPV genotypes hindered precise estimations of their oncogenic potentials. This paucity is partly due to our strict definition of cases as having ICC. This choice was motivated by the natural history of cervical cancer, according to which precancerous lesions, even high-grade cervical intraepithelial neoplasia, may regress in a substantial proportion of cases [71]. Previous studies [17,72] might have been more permissive, assimilating high-grade lesions and ICC cases, especially longitudinal studies, often limited by the low numbers of ICC during the follow-up. Moreover, clinical management guidelines also recommend the excision of precancerous lesions, and will continue to do so as long as whether these would regress or progress cannot be foreseen [73].
Fourth, we did not distinguish between ICC histologic types, even though HPV-18 could be more prevalent in adenocarcinomas than squamous cell carcinomas [13]. However, the HPV-genotype-specific distribution according to histologic type was seldom reported in selected studies. When histologic type was reported, most were squamous cell carcinomas, which is the most common histologic cervical cancer type [74].
Fifth, our analysis was limited by the variety of sample types and HPV assays, as in previously reported metaanalyses of HPV-genotype-specific prevalences [57,59,60]. Although all HPV-detection methods were PCR-based, sensitivity and specificity of PCR protocols varied across studies and numerous HPV genotypes were not detected by some of them. However, each study used the same HPV-typing method for cases and controls, so it is unlikely that the differences among studies affected our estimates. Moreover, the heterogeneity in our metaanalyses was not explained by the HPV-detection threshold for cases.
Finally, because the components of multiple infections were seldom available, the oncogenic potential of each HPV genotype was assessed without distinguishing between single or coinfection. Thus, the oncogenic potentials of some HPV genotypes might have been overestimated in our meta-analysis if they had been coinfection partners with established high-risk genotypes, e.g., HPV-16 or -18, and were wrongly accorded equal weight in cancerous lesions even though the high-risk genotype was solely responsible for the lesions [11]. That possibility could explain HPV-11's unexpectedly higher oncogenic potential. Moreover, for studies that did not report coinfection, misattribution of the causal HPV genotype could bias the estimated oncogenic potentials of coinfecting HPV genotypes either way [75]. A new generation of molecular studies involving lesion microdissection and HPV-E6/E7 expression could provide valuable information to assess more specifically each HPV genotype's oncogenic potential [9,76].

Conclusions
Our results provide further evidence reinforcing the high oncogenic potentials of genotypes HPV-18, -31, -33, -45, -52 and -58, already classified as high-risk for ICC. They also highlight the need to include in detection kits HPV-34, -67, -69 and -73, for which epidemiologic data are currently lacking, and to further examine their possibly underestimated oncogenic potentials. Moreover, although HPV-39 and -59 belong to the same α-7 species as HPV-18, they are not, at present, included in a future nonavalent anti-HPV vaccine (HPV-6, -11, -16, -18, -31, -33, -45, -52 and -58) [77]. Those genotypes may deserve further consideration, owing to accumulating evidence (relatively precise estimates) and their classification among the 10 most oncogenic genotypes after HPV-16 in our meta-analyses. Pooling individual data from presently available and future studies investigating these genotypes would allow more robust estimates, especially if controlled for age. Overall, such findings may have important implications for the prevention of cervical cancer and could help guide HPV-based-screening programs [78] and the composition of the second-generation anti-HPV vaccines [79].

Additional files
Additional file 1: Search strategies.
Additional file 2: Meta-analyses assessing the relative oncogenic potential of each human papillomavirus (HPV) genotype (forest plots). Studies are listed in alphabetical order. Each study is represented by a black cross, which corresponds to the odds ratio (OR) point estimate; a grey square, whose area reflects the weight each study contributes in the meta-analysis; and a horizontal line, which spans the 95% confidence interval (CI). The diamond at the bottom of the graph represents the combined OR and its 95% CI. The solid vertical line is an oncogenic potential equal to that of HPV-16 (OR 1.0) and the dotted vertical line indicates the value of the combined ORs from the randomeffects model. The graphs were generated by Stata command metan (adapted from [26] pp 14 and 33).
Additional file 3: Bias assessment for each meta-analysis (funnel plots). Each dot represents one study. The solid vertical line is the pooled odds ratio (OR). Diagonal dashed lines represent the pseudo 95% confidence limits around the pooled OR for each standard error of the ordinate vertical axis values, defining a funnel within which 95% of the studies should lie in the absence of heterogeneity or selection biases. The yellow line is the fitted linear-regression line of the OR plotted against its standard error (both on natural logarithm scales) and corresponds to Egger's test for funnel-plot asymmetry. The graphs were generated by the Stata command metafunnel (adapted from [29] pp 113 and 115).
Additional file 4: Sensitivity analyses of human papillomavirus genotype ranking. *Analyses using a fixed-effect model [27]. †Analyses using DerSimonian and Laird's random-effects model [20] with a continuity correction (CC) = 0.25. ‡Analyses using DerSimonian and Laird's random-effects model [20] with CC = 0. HPV-62 and -69 ORs could not be calculated. §Analyses using DerSimonian and Laird's random-effects model [20] (CC = 0.5), with HPV-negative subjects as the reference group. In this model, unlike the preceding ones, the pooled OR for HPV-16 could be estimated. Abbreviations: HPV, human papilloma virus; OR, odds ratio; CI, confidence interval.