The study was carried out in two steps. First, a systematic literature review was conducted to collect HBeAg prevalence data and second, an epidemiological model was applied to estimate age- and sex-specific HBeAg prevalence for the 21 world regions (Additional file 1: web annex 1).
HBeAg prevalence information was identified from a previously conducted and described systematic review on hepatitis prevalence . In brief, articles published between 1980 and 2007 and reporting prevalence of hepatitis virus infection were systematically searched. A total of 6064 English citations were found (3273 Medline, 2283 Embase, 508 Cinhal). Review articles, outbreak investigations and national infectious disease notification reports were excluded. Data reported in the article had to be reasonably representative of the general population rather than conducted among a special high-risk group (i.e. injecting drug users, HIV-positive individuals) or a population that was selected based on a risk factor for viral hepatitis or a condition associated with hepatitis infection.
After applying manual de-duplication and the exclusion criteria on the abstract, 1233 articles on hepatitis B and C prevalence were obtained (references to studies: web annex 1 of reference , out of which 582 included sero-markers of HBV infection. Most articles were excluded since they reported HBV markers other than HBeAg or were conducted among high-risk groups.
Extracted data on HBeAg prevalence among HBsAg positive individuals were grouped according to 21 Global Burden of Disease Regions and criteria such as study population size, sampling procedure and representativeness were taken into consideration to rate the quality of the study. Using the extracted study seroprevalence data, prevalence of HBeAg was modeled using Dismod III v3.0, a generic disease modeling system , which models multiple disease parameters, including incidence, prevalence, remission, and mortality, in order to ensure consistency among the parameters. Data on each of these parameters are synthesized using a hierarchical empirical Bayesian model to make estimates for 21 world regions based on observed data in each modeled region, data observed in other regions, and data from other time periods (by estimating a time trend). Briefly, Dismod III first fits an empirical prior estimate separately for each disease parameter (e.g., prevalence and incidence). The empirical prior has the following elements: geographic hierarchy, in which estimates for each region are informed by data from the same region and (to a lesser extent) data from other regions; a flexible age pattern; a linear time trend; and an offset for data on males. Second, for each time period (1980–1997 and 1997-present), sex, and region, Dismod fits a Bayesian model using all data in that time-sex-region group and empirical priors for all epidemiological parameters, generating posterior estimates of incidence, prevalence, remission, and mortality that are internally consistent. In our model, the empirical priors for incidence, remission, and mortality were uninformative; thus the posterior was informed only by prevalence data. Like the empirical prior, the posterior models also incorporate linear time trends, flexible age patterns, and offsets for data on males.