Diagnostic accuracy of tests to detect hepatitis B surface antigen: a systematic review of the literature and meta-analysis

Background Chronic Hepatitis B Virus (HBV) infection is characterised by the persistence of hepatitis B surface antigen (HBsAg). Expanding HBV diagnosis and treatment programmes into low resource settings will require high quality but inexpensive rapid diagnostic tests (RDTs) in addition to laboratory-based enzyme immunoassays (EIAs) to detect HBsAg. The purpose of this review is to assess the clinical accuracy of available diagnostic tests to detect HBsAg to inform recommendations on testing strategies in 2017 WHO hepatitis testing guidelines. Methods The systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines using 9 databases. Two reviewers independently extracted data according to a pre-specified plan and evaluated study quality. Meta-analysis was performed. HBsAg diagnostic accuracy of rapid diagnostic tests (RDTs) was compared to enzyme immunoassay (EIA) and nucleic-acid test (NAT) reference standards. Subanalyses were performed to determine accuracy among brands, HIV-status and specimen type. Results Of the 40 studies that met the inclusion criteria, 33 compared RDTs and/or EIAs against EIAs and 7 against NATs as reference standards. Thirty studies assessed diagnostic accuracy of 33 brands of RDTs in 23,716 individuals from 23 countries using EIA as the reference standard. The pooled sensitivity and specificity were 90.0% (95% CI: 89.1, 90.8) and 99.5% (95% CI: 99.4, 99.5) respectively, but accuracy varied widely among brands. Accuracy did not differ significantly whether serum, plasma, venous or capillary whole blood was used. Pooled sensitivity of RDTs in 5 studies of HIV-positive persons was lower at 72.3% (95% CI: 67.9, 76.4) compared to that in HIV-negative persons, but specificity remained high. Five studies evaluated 8 EIAs against a chemiluminescence immunoassay reference standard with a pooled sensitivity and specificity of 88.9% (95% CI: 87.0, 90.6) and 98.4% (95% CI: 97.8, 98.8), respectively. Accuracy of both RDTs and EIAs using a NAT reference were generally lower, especially amongst HIV-positive cohorts. Conclusions HBsAg RDTs have good sensitivity and excellent specificity compared to laboratory immunoassays as a reference standard. Sensitivity of HBsAg RDTs may be lower in HIV infected individuals. Electronic supplementary material The online version of this article (10.1186/s12879-017-2772-3) contains supplementary material, which is available to authorized users.


Background
An estimated 257 million individuals worldwide are chronically infected with hepatitis B virus (HBV), of whom 2.7 million are co-infected with HIV [1]. Globally, between 20 and 30% of patients with chronic HBV infection will develop cirrhosis or hepatocellular carcinoma [2], accounting for the majority of the attributable 686, 000 deaths [3] and 21 million disability-adjusted lifeyears annually [4]. Most individuals with chronic HBV infection however are not aware of their serostatus. Delayed diagnosis means that many may progress to long term complications and present only with advanced disease [5]. Expanded access to testing for HBV is critically important in order to increase numbers of infected individuals aware of their status for linkage to care, as well as identifying candidates for HBV vaccination and facilitating prevention and control efforts.
In March 2015 the World Health Organization (WHO) published the first global guidelines for the prevention, care, and treatment of individuals with chronic HBV infection [5]. These guidelines focused on assessment for treatment eligibility, initiation of first and second-line therapies, and monitoring. These initial guidelines did not include testing recommendations, and in particular which tests to use. Given the large burden of HBV in low and middle income settings where there are limited or no existing HBV testing guidelines, the development of HBV testing guidelines is a priority.
Advances in HBV detection technology have created new opportunities for testing, referral, and treatment. Chronic HBV infection is defined as persistence of hepatitis B surface antigen (HBSAg) for at least six months, and the testing strategy involves an initial serological test to detect HBsAg followed by nucleic-acid amplification test (NAT) for detection of HBV DNA viral load to help guide treatment decisions [5]. HBsAg can be detected using rapid diagnostic tests (RDTs) in lateral flow, flow through or simple agglutination assays formats. Laboratory-based immunoassays to detect HBsAg include traditional radioimmunoassays (RIA) and enzyme immunoassays (EIA), as well as newer technologies such as electrochemiluminescence immunoassays (ECLIA), microparticle enzyme immunoassays (MEIA) and chemiluminescent microparticle immunoassays (CMIA), which use signal amplification to give quantitative measurements.
Previous systematic reviews on HBV infection have focused on effectiveness of immune responses to HBV vaccination [6], surveillance of cirrhosis [7], and evaluation of treatment effectiveness [8]. Prior reviews on hepatitis B testing [9][10][11] only focused on the performance of tests that can be used at the point of care. They also included evaluations with unclear reference standards and studies that used serum panels to evaluate test performance, which are inappropriate for assessing clinical or operational diagnostic accuracy in the field. This review aimed to assess the diagnostic accuracy of assays used to detect HBsAg in order to inform WHO and other guidelines on hepatitis testing [12]. This was the first study exclusively comparing the clinical performance of both RDTs and laboratory-based immunoassays, in addition to addressing the question of accuracy in the context of HIV status. The accuracy of HBsAg assays against a NAT reference standard was also undertaken, given the importance of reducing transmission during the seroconversion period and in the diagnosis of occult hepatitis B where HBsAg may not be detectable, which is more common with HIV co-infection. The purpose of this review was to provide quantitative evidence of the accuracy of available diagnostics to detect HBsAg in order to inform global guidelines.

Search strategy and identification of studies
We conducted a systematic review and meta-analysis on the diagnostic accuracy of HBsAg tests. The review was registered in PROSPERO (CRD42015020313) and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) check list. We utilised standardised methods for systematic reviews on diagnostics, including an a priori protocol (Additional file 1).
Literature search strategies were developed by a medical librarian with expertise in systematic review searching, using a search algorithm consisting of terms for: hepatitis B, diagnostic tests, and diagnostic accuracy. We searched MEDLINE, EMBASE, the Cochrane Central Register of Controlled Trials, Science Citation Index Expanded, SCOPUS, Literatura Latino-Americana e do Caribe em Ciências da Saúde (LILACS), WHO Global Index Medicus, WHO's International Clinical Trials Registry and the Web of Science. We also contacted researchers, experts and authors of major trials, with no relevant manuscripts in preparation identified. Additional pertinent citations were identified through bibliographies of retrieved studies.
Abstracts were screened by reviewers AA and HK according to standard inclusion and exclusion criteria. All studies identified for full manuscript review were assessed independently by two reviewers (AA and OV) against inclusion criteria. Papers were accepted or rejected, with reasons for exclusion specified. Discrepancies were resolved by discussion between review authors and, when required, a third independent reviewer (RP).

Selection criteria
Inclusion criteria were: case-control, cross-sectional, cohort studies or randomized trials published between 1996 and May 2015; primary purpose of evaluating HBsAg test accuracy; commercially available laboratory immunoassays or NAT as reference standard; any clinical specimen type. We excluded: articles in languages other than English; conference abstracts, comments or review papers; studies only reporting sensitivity or specificity without reference standards; studies using commercially prepared reference panels.
We included studies reporting original data from patient specimens in all age groups, settings, countries and specimen types. We performed a sub-analysis comparing test accuracy before 2005 with more recent studies published between 2005 and 2015 as the accuracy of reference standard immunoassays has improved over time. This time period was chosen as it was 10 years prior to the literature search, matched with a similar metaanalysis on hepatitis C tests (Ref Paper 11), and was around the time of the last WHO review of HbsAg assay operational characteristics [13]. Studies comparing the accuracy of laboratory based immunoassays were only included if they used CMIAs as the reference standard; most excluded studies using other platforms included reference panels, while five specifically used non-CMIA reference assays. Given the association between false negatives and a low OD/CO, it was reasonable to presume sensitivity is reduced with low HBsAg levels. CMIA has excellent analytical sensitivity (0.05 IU/ml) [14][15][16], and can be used to quantitate HBsAg levels in clinical specimens [17]. These platforms are the most widely used in clinical practice [18] given automation and high throughput, with data on kinetics and sensitivity in HIV-HBV co-infection.

Data extraction and quality assessment
Two authors (AA and OV) independently extracted data and reached agreement on the following variables: study author and year; study location and design; specimens tested; eligibility criteria; index test and reference standard, including manufacturer; raw cell numbers (true positives, false negatives, false positives, true negatives); HIV co-infection; sources of funding and reported conflict of interest.
Study quality was evaluated using the QUADAS-2 tool [19], which evaluates risk of bias (patient selection, index test, reference standard, and patient flow through) and applicability concerns (patient selection, index test, reference standard).

Data analysis and synthesis
We conducted meta-analysis pooling data using the DerSimonian-Laird bivariate random effects model (REM) to calculate pooled sensitivity and specificity with 95% confidence intervals (CI), which were used to estimate positive and negative likelihood ratios (PLR, NLR).

Study selection and characteristics
A total of 11,589 citations were identified, and 293 fulltext articles examined which identified 40 studies meeting pre-defined criteria (Fig. 1). Of the included studies, 33 compared RDTs [14,18, and/or EIAs [14,[47][48][49][50] against an immunoassay reference standard, of which five focused on accuracy in HIV-positive individuals [26,[44][45][46][47]. Seven studies compared RDTs [51][52][53] and/or EIAs [53][54][55][56][57] against a NAT reference standard, of which 3 had data from HIV-positive patients [53,56,57]. Studies were all either cross-sectional or case-control, predominantly in the laboratory setting, and performed in a broad range of populations, including healthy volunteers, blood donors, pregnant women, incarcerated adults, HIV and hepatitis patient cohorts with confirmed HBV infection. The prevalence of HBV ranged from 1.9 to 84% in populations tested. A mixture of serum, plasma and whole blood was used for RDTs, while studies of EIAs were performed on serum or plasma samples. Study characteristics are presented in Tables 1, 2 and 3.

Assessment of the quality of the studies
The QUADAS-2 assessment for risk of bias of each study, including sub-studies deriving separate data points is presented in (Fig. 2a, b), with a summary in (Fig. 3). Bias in patient selection was generally attributable to a casecontrol study design (38%), or from enrolment of highly selected populations such as blood donors or those with known hepatitis B virus infection. Risk of bias from the index test was most commonly due to insufficient reporting of blinding or evaluation of RDTs which are no longer commercially available. Although the majority of studies did not specify the exact time interval between performance of the index and reference assays, it was assumed to be at low risk of bias as the assays were performed on the same sample. Applicability was judged to be higher risk for bias predominantly due to inclusion of older studies, those that evaluated tests which are no longer commercially available or studies using a NAT reference.
Diagnostic accuracy compared to a nucleic acid reference standard

Rapid diagnostic tests
Three studies [51-53] evaluated 7 RDTs in samples from 510 patients against a NAT reference standard, although some samples were used for multiple testing episodes with different tests. One study [52] used plasma from Nigerian repeat blood donors. Sensitivities ranged from 38% to 99% and specificities ranged from 94 to 99%. Overall pooled sensitivity and specificity were 93.3% (95% CI: 91.3, 94.9) and 98.1% (95% CI: 97.0, 98.9), respectively, with significant heterogeneity in terms of sensitivity [ Fig. 9; Table 3; Additional file 3]. One case-control study [51] evaluating five different tests in 240 Iranian patients, had significantly higher sensitivity and specificity compared to the other studies, contributing to the overall statistical heterogeneity (τ2 = 5.82).

Study findings
Our systematic review and meta-analysis shows that both RDTs and EIAs had excellent specificity for the detection of HBsAg when compared to laboratory-based assays. Although the pooled sensitivity of RDTs was only 90% compared to laboratory based EIAs, the 10% lower sensitivity of RDTs may be an acceptable trade-off for opportunities to use RDTs to increase access to testing to all levels of the health care system. Significant heterogeneity with a broad range of sensitivity estimates was observed across studies and different brands as well as across studies for the same brand. Accuracy and quality of RDTs should be important considerations in test selection for national programmes. Apart from the rapid results and ease of use, RDTs can be used with whole blood from a finger prick compared to the necessity of processing blood samples to obtain serum or plasma for use with EIAs. Our review showed that accuracy using capillary or venous whole blood was not significantly different from studies using plasma or serum, which offers convenient specimen sampling outside of laboratory settings without compromising test accuracy.
None of the RDTs met minimum requirements for analytical sensitivity (i.e. limit of detection [LOD] of 0.130 IU/mL) required by regulatory authorities such as the European Union; WHO prequalification assessment studies indicate a 50-100 fold lower LoD for EIAs (0.1 IU/mL) compared to RDTs (2-10 IU/mL) [15]. Clinical sensitivity is however unlikely to be greatly reduced as the majority of chronic HBV is associated with blood HBsAg concentrations well above 10 IU/mL and false-negative HBsAg RDTs are associated with lower HBsAg and viral load, presence of HBsAg mutants, or specific genotypes [15,23,34,47].
We found lower sensitivity of RDTs in HIV-positive individuals; however, there did not appear to be a similar reduction in the single study assessing three different EIAs in this cohort using an EIA reference with neutralisation [47]. The reasons for the apparent lower performance are unclear. Studies quantifying HBsAg found that in the context of co-infection, most false negatives had lower concentrations of HBsAg and generally lower HBV DNA than true positives [46,47]. HIV-reverse transcriptase inhibitors active against HBV can modestly reduce HBsAg levels and therefore detection by RDTs [58][59][60]; patients treated for a median 47 months demonstrated significantly lower median HBsAg levels compared to untreated patients (3.32log10 vs 4.23log10) (p = 0.001), with the most marked reduction in HBeAg positive patients and those with a more robust improvement of CD4 from nadir on cART [61]. In our review, the two studies with preserved sensitivity were in exclusively ART-naïve patients with median CD4 175 cell/uL [26] and 250 cells/uL [44]. Studies with sensitivity less than 80% were in cohorts which included patients on lamivudine-containing ART [46,47] or ART-naïve with a higher median CD4 (350 cells/uL) [45]. As most patients in the field will be ART-naïve as part of dual screening programmes, the clinical impact of reduced sensitivity could be less significant as most will have detectable higher HBsAg levels. Another theoretical Fig. 3 Risk of bias and applicability summary for using (a) laboratory, or (b) nucleic-acid reference standard explanation in the context of ART is that given overlapping surface and polymerase genes, lamivudine with its low genetic barrier to resistance could promote the emergence of surface genome variants undetectable by standard assays; mutations in the "a" antigenic determinant region of HBsAg can cause conformational changes leading to decreased accuracy of diagnosis [62]. This was, however, only a minor contributor to reduced performance in the single study assessing mutants in HIV-HBV co-infection [47], with reduced analytical sensitivity of assays more important. Further reasons for reduced sensitivity of lateral flow devices in the context of HIV could be due to either an increased presence of blocking antibodies to HBsAg and immune-complex formation in   HIV-associated hypergammaglobulinaemia, or the prozone effect at high antigen concentrations. Assay sensitivity also varies depending on genotypes, and it could be that regions with high HIV co-infection also have a higher proportion of poorly detected genotypes. Finally, as studies were cross-sectional in nature, we can't assess and compare the true prevalence of chronic HBV in cohorts or the progression of diseaseit may be that there is an increased prevalence of acute and/ or chronic HBV in HIV-cohorts, with RDTs missing low level HBsAg in patients who are in the process of seroconverting from their illness. Further studies are required following up patients with HIV and full HBV serology to further ascertain reasons for and the clinical impact of reduced sensitivity of RDTs. Accuracy of both HBsAg RDTs and EIAs compared to a NAT reference was generally lower, especially amongst HIV-positive cohorts; sensitivity of RDTs was generally <60%, with one laboratory based case-control study evaluating six RDTs contributing to potential overestimation of pooled sensitivity [51]. Although NAT assays are not optimal reference standards for HBsAg, given the complex relationship between viral kinetics of HBV DNA and levels of HBsAg, NAT assays are nevertheless useful markers of viremia and disease activity to guide treatment, as well as the detection of occult hepatitis B. Occult hepatitis B (OHB) is defined as the presence of HBV DNA in serum or liver tissue with undetectable HBsAg [57]. Studies in ART-naïve East [63] and West-African [64] patients found an OHB prevalence of 10-15%, with significantly lower HBV viral loads in these individuals compared to those with detectable HBsAg [47]. Knowledge of HBeAg status and ART regimes is relevant, as dually active ART could successfully suppress HBV viral load and HBsAg detection [58,59]. Now that it is possible to use CMIA to quantitate HBsAg, and levels of HBsAg has been correlated with intrahepatic cccDNA clearance during treatment, further research should explore the use of CMIA to quantitate HBsAg levels as potential markers of disease resolution.
The pooled sensitivity for RDTS in this review is lower than that reported in previous systematic reviews (pooled sensitivities were 97.1% [11], 94.8% [10], and 98.1% [9]). This may be due to the use of different inclusion criteria in the prior reviews. Accuracy estimates tend to be higher when the RDTs were evaluated in laboratory settings using archived evaluation panels than  when they are evaluated in field settings in patients attending a clinical facility, who may have a variety of underlying conditions or co-infections that affect test performance. In the case of RDTs, the tests may be stored and used in uncontrolled physical environments and performed by users who may not have ever performed a test. Data on the clinical performance of these assays are more relevant for developing guideline recommendations.

Sources of heterogeneity
Statistical heterogeneity is observed in most diagnostic accuracy reviews. None of the sub-analyses performed eliminated heterogeneity, which could be due to a number of factors. Variability of assays could result in statistical heterogeneity. This persisted despite subgrouping by brand, although it should be noted that the same brand often undergoes minor product changes and modifications over time, particularly with changes in the manufacturer. Variation in reference standards also contributed to different RDT sensitivity. Pooled sensitivity of RDTs was lower when compared to a CMIA reference standard (80.4%) than a reference including non-CMIA technology (90.0%). ELISA/EIA based assays in particular performed poorly relative to other immunoassays when compared to a CMIA reference [14,49]; different signal cut-off ratio's (S/CO) and use of the 'gray zone' improved sensitivity at the expense of specificity. Accuracy of tests also varies depending on the phase of chronic HBV infection, with reduced sensitivity more common in the inactive carrier state compared to the active replicative phase. In a Gambian field study [18], the majority (94.7%) of false-negative RDT results were from inactive carriers; they were all HBeAg negative with normal ALT levels, more commonly female (p = 0.05) and had lower median quantitative HBsAg levels compared to true positives (1.2 IU/mL vs 875 IU/mL) (p = 0.0002). Of note, RDTs also had a lower limit of detection in the field (26.5 IU/mL) compared to the laboratory setting (2.8 IU/mL), although the clinical sensitivity was similar, albeit in a study where field staff were all adequately trained. Although inactive carriers often do not warrant treatment, 17% had elevated liver stiffness and were precirrhotic, so would have benefited from antiviral therapy [65]. Further studies are required to assess the clinical impact of reduced RDT sensitivity, particularly those performed in the field.
Finally, the large variability in study design across the literature is a significant source of heterogeneity. A large number of case-control studies with pre-selection of known cases and controls tend to over-estimate accuracy, in part due to the higher quantitative ranges of HBsAg in those with known active disease. Performance in higher income countries tends to be less heterogeneous [11], while reduced accuracy observed in lowresource settings may be due to insufficient training or lack of quality assurance systems [66]. Pooled sensitivity and specificity tend to be lower when the RDTs are used in the field compared to studies where they were performed in laboratory settings [26,37].

Study strengths
Strengths of this review include evaluation of a comprehensive evidence base, use of a pre-specified protocol incorporating numerous major scientific databases, and assessment of additional areas relevant to HBsAg diagnostic testing, notably comparison with NAT and potential impact of occult hepatitis B. We identified 11 additional articles [18,22,25,28,29,35,37,43,[45][46][47] not found in the most recent systematic review assessing the diagnostic accuracy of RDTs [11]. The pooled sensitivity for RDTs in this review is lower than reported in previous systematic reviews (pooled sensitivities of 97.1% [11], 94.8% [10], and 98.1% [9]). Potential reasons include the different inclusion criteria; previous reviews included a mixture of studies of analytical performance using serum panels and clinical studies. As previously explained, accuracy estimates tend to be higher when tests are evaluated in laboratory settings using archived evaluation panels, with estimates less relevant for informing the development of testing or operational guidelines.
We included evaluations of both RDTs and EIAs, in addition to evaluation using a NAT reference, and as such are able to evaluate the effects of different types of HBsAg assays and different types of reference assays.

Limitations
Our study has a number of limitations. Many studies were case-control designs or evaluated cohorts known to over-estimate accuracy. We were unable to assess diagnostic accuracy specifically in field studies as definitions of "in the field" are open to interpretation with methods poorly described in many papers. Only two studies (1, 2) specifically mention the use of RDTs in the field. Since the purpose of our review was to assess clinical performance, we included papers describing evaluations of test performance in patients in clinical settings and not laboratory based evaluations using reference panels. Some analyses were based on a small number of patients and few positive samples. We were unable to explore potential sources of heterogeneity due to genotype, stage and severity of infection or other coinfections; genetic information has long been suspected to impact on diagnostic accuracy [67][68][69][70][71][72], and mutants are rapidly evolving such that prevalence of specific types cannot be determined on historical data. The use of different reference standards makes pooling across studies difficult; this is further complicated by rapid changes in technology and analytical sensitivity combined with suboptimal reporting of LOD in both index tests and reference standards. For studies using NAT as a reference, assays were not standardized, with poor reporting of testing, albeit all were according to the manufacturer's instructions; some used pooled NAT of HBsAg negative sample [55,57], while others described inadequate detail for qPCR methodology [51,54]. Finally, the natural history of diagnostic markers in chronic hepatitis B is more complex than most viral infections, with transient low level asynchronous quantitative fluctuations of HBsAg and DNA recognised in uncomplicated chronic HBV [73]. Such cases are clinically less severe and of lower priority than persons with higher levels of viremia, but are likely to impact estimates of sensitivity and specificity.

Implications
The global burden and relative rank of hepatitis B in terms of health loss has increased in the last two decades, unlike most communicable diseases. Implementation of timely and accurate testing strategies in many endemic settings is poor, hindering the linkage to care. Rapid tests are suited to improve the uptake of testing in resource limited settings, particularly amongst remote and vulnerable populations, but evidence is lacking for the impact of testing at the point of care on service delivery and linkage to and uptake of subsequent care. Research is needed on the clinical impact of reduced RDT sensitivity given the association of low quantitative HBsAg missed by testing with inactive carriers and minimal disease progression [74]. Validation of assays in the context of immune escape variants and using less invasive collection methods would support the development of demographic specific testing strategies. Finally, concerns about the low sensitivity of RDTs in HIV positive cohorts warrant particular evaluation, given the growing global challenge posed by co-infection, drug resistance and inadequate approaches to management of HBV and prevention of mother to child transmission in pregnant women [75]. Studies assessing the impact of viral load, CD4 and ART regimen exposure on HBsAg diagnostic accuracy are urgently needed, particularly the potential prudence of repeat HBsAg testing after a certain time in high risk individuals who may have seroconverted or progressed.

Conclusion
In summary, this meta-analysis demonstrates that RDTs to detect HBsAg, performed on either serum, plasma or whole blood, have a pooled sensitivity of >90% and specificity of >98% compared to laboratory methods of HBsAg detection, using EIAs as the reference standard. Sensitivity varies widely overall and within brands of HBsAg tests. Sensitivity of RDTs may be lower in HIVpositive individuals, although possibly less so in ARTnaïve individuals who would benefit most from screening using dual HIV-HBV RDTs in settings with limited access to laboratories. Further research is needed to assess the impact of using RDTs in a variety of settings and populations. WHO guidelines currently recommend a role for RDTs in scaling up HBsAg testing in settings with poor access to or lack of existing laboratory infrastructure, such as remote settings or with hard-to-reach populations. Their use may also be appropriate in highincome countries to increase the uptake of hepatitis testing in populations that may be reluctant to test or have poor access to health-care services and in outreach programmes [12].