Field suitability and diagnostic accuracy of the Biocentric® open real-time PCR platform for plasma-based HIV viral load quantification in Swaziland

Background Viral load (VL) testing is being scaled up in resource-limited settings. However, not all commercially available VL testing methods have been evaluated under field conditions. This study is one of a few to evaluate the Biocentric platform for VL quantification in routine practice in Sub-Saharan Africa. Methods Venous blood specimens were obtained from patients eligible for VL testing at two health facilities in Swaziland from October 2016 to March 2017. Samples were centrifuged at two laboratories (LAB-1, LAB-2) to obtain paired plasma specimens for VL quantification with the national reference method and on the Biocentric platform. Agreement (correlation, Bland–Altman) and accuracy (sensitivity, specificity) indicators were calculated at the VL thresholds of 416 (2.62 log10) and 1000 (3.0 log10) copies/mL. Leftover samples from patients with discordant VL results were re-quantified and accuracy indicators recalculated. Logistic regression was used to compare laboratory performance. Results A total of 364 paired plasma samples (LAB-1: n = 198; LAB-2: n = 166) were successfully tested using both methods. The correlation was high (R = 0.82, p < 0.01), and the Bland–Altman analysis showed a minimal mean difference (− 0.03 log10 copies/mL; 95% CI: -1.15 to 1.08). At the clinical threshold level of 3.0 log10 copies/mL, the sensitivity was 88.6% (95% CI: 78.7 to 94.9) and the specificity was 98.3% (95% CI: 96.1 to 99.4). Sensitivity was higher in LAB-1 (100%; 95% CI: 71.5 to 100) than in LAB-2 (86.4%; 95% CI: 75.0 to 94.0). Most upward (n = 8, 2.2%) and downward (n = 11, 3.0%) misclassifications occurred at the 2.62 log threshold, with LAB-2 having a 16 (95% CI: 2.26 to 113.27; p = 0.006) times higher odds of downward misclassification. After retesting of discordant leftover samples (n = 17), overall sensitivity increased to 93.5% (95% CI: 85.5 to 97.9) and 97.1% (95% CI: 90.1 to 99.7) at the 2.62 and 3.0 thresholds, and specificity increased to 98.6% (95% CI: 96.5 to 99.6) and 99.0% (95% CI: 97.0 to 99.8) respectively. Conclusions The test characteristics of the Biocentric platform were overall comparable to the national reference method for VL quantification. One laboratory tended to misclassify VL results downwards, likely owing to unmet training needs and lack of previous hands-on practice. Electronic supplementary material The online version of this article (10.1186/s12879-018-3474-1) contains supplementary material, which is available to authorized users.


Background
The World Health Organization (WHO) recommends routine viral load (VL) testing at 6 and 12 months after initiation of antiretroviral therapy (ART) and annually thereafter [1]. Quantifying the patient's VL allows clinicians to monitor the effectiveness of ART, to trigger adherence counselling interventions when VL is elevated above a clinical threshold (e.g. ≥1000 copies/mL), to diagnose virological failure, and to make timely and correct decisions on treatment switching [1][2][3]. Because the WHO recommends immediate initiation of ART at the time of HIV diagnosis irrespective of CD4 cell count and WHO staging criteria [1,4], the number of patients needing routine VL testing will increase in the coming years. Although HIV programmes using routine VL monitoring have shown decreased morbidity and mortality [3], the expansion of VL testing creates clinical and programmatic challenges in resource-limited settings (RLS) [5,6] and access to HIV monitoring services remains suboptimal [7,8].
An important bottleneck is the suboptimal capacity of national laboratories in RLS to perform VL testing at scale. The supply weakness is often due to lack of funding to procure VL testing platforms and consumables, inability to recruit and retain qualified staff, lack of adequate training, and suboptimal servicing and maintenance of equipment [7]. Establishment of multiple laboratories in one country and the deployment of various platforms by different stakeholders (e.g. non-governmental organizations) is one strategy to overcome supply chain shortfalls and stimulate market competition [5]. This approach, however, raises concerns about comparability of VL test results between platforms and laboratories as well as about quality assurance and control.
Swaziland is increasing access to routine VL monitoring. The Ministry of Health performs VL testing using the Roche method, and Médecins Sans Frontières (MSF) has been performing VL quantification using the Biocentric method [9]. In 2015, the decision was taken to perform an in-country assessment of the Biocentric method to assess its suitability for contributing to expansion of VL testing in Swaziland. Thus, we compared the performance of the Biocentric platform under field conditions using plasma for VL testing in comparison with the national reference platform. The findings reported here are part of a larger prospective evaluation study comparing the test characteristics of the Biocentric platform, using different sampling and processing procedures (plasma and dried-blood spots [DBS]) for VL testing.

Setting
Swaziland is the country with the highest HIV prevalence (32% in people aged 18-49 years) in the world [10]. HIV care and treatment has been expanded, and close to 150,000 people received ART in 2015 [11]. Swaziland is expanding routine VL monitoring, and several VL platforms have been established. Three Roche platforms are operated, one at the National Reference Laboratory at Mbabane and two at decentralized sites (Manzini, Siteki). Since 2012, the Biocentric platform has been used in Nhlagano Laboratory in southern Swaziland, serving 25 rural primary and secondary healthcare facilities, with approximately 25,000 VL tests performed annually. It has been enrolled in the External Quality Assurance Program with the US Centers for Disease Control and Prevention (CDC) for proficiency testing. In addition, a second Biocentric platform was established at the National Reference Laboratory in 2016 but had not been used before this study. This study used the more recent Biocentric platform which was released in 2016. It was upgraded at Nhlangano laboratory (LAB-1) and newly installed at the National Reference Laboratory in Mbabane (LAB-2).

VL platforms
The reference platform was the quantitative COBAS AmpliPrep/COBAS TaqMan (CAP/CTM) HIV-1 Test, Version 2.0 (Roche Molecular Diagnostics, Indiana, USA), operated at LAB-2 (Mbabane). It is a fully automated, closed system testing 63 samples per run with 5-8 h needed to obtain results. The lower limit of detection is 20 copies/mL (corresponding to 1.3 log 10 copies/ mL). Standardized internal quality control samples are provided and the reference laboratory is enrolled with the CDC laboratory external quality assurance program, monitoring the quality of VL testing and reporting twice per year.
The comparator comprised two Biocentric platforms operated at LAB-1 (Nhlangano) and at LAB-2 (Mbabane). This multi-manufacturer open platform consists of an open automated RNA and DNA extractor (Arrow®) and a real-time PCR system (FluoroCycler® 96) for nucleic acid amplification and detection. It uses the Generic HIV Charge Virale assay and test kits, which were developed by the French Agency for Research on AIDS and viral hepatitis (ANRS) and are manufactured and commercialized by Biocentric (Bandol, France) [12]. Internal quality control is provided by standards in the assay. This somewhat manual system has a time to results of approximately 3 h, with 96 samples per run (82 patient samples, five standards per duplicate, and one positive and one negative control per duplicate). The average limit of detection of HIV RNA at a positivity rate of > 95% with 250 μL plasma input volume is 416 (95% CI: 388 to 450) copies/mL [12]. The Biocentric assay received CE certification by a European Notified Body (British Standards Institution) and has been submitted for WHO pre-qualification of in vitro diagnostics. Further details on the method are available elsewhere [13].

Study sample and procedures
Experienced laboratory technologists at LAB-1 received short refresher training on the Biocentric platform. Most laboratory technologists at LAB-2 had no experience in the Biocentric method and they received training over 3 days as per recommendation of the manufacturer. Figure 1 shows the study flow chart. From 12 October 2016 to 1 March 2017, HIV-infected adults (≥18 years) were recruited at Nhlangano Health Centre and Lobamba Clinic when they were eligible for VL testing according to the local VL testing algorithm (a baseline VL before ART initiation and during ART). During the recruitment phase, Lobamba Clinic introduced universal ART provision (thus many patients were eligible for ART initiation and received a pre-treatment VL test), while most patients in Nhlangano Health Centre were already established on ART (and thus received a follow-up VL test). The nurse obtained written consent, collected baseline information and referred patients for phlebotomy. A phlebotomist obtained one 4 mL venous blood ethylenediaminetetraacetic acid (EDTA) tube from each participant. In addition, a second EDTA tube and DBS cards were prepared as part of the larger study (details and results not reported here). The blood tubes obtained at Nhlangano Health Centre were sent to LAB-1 and those obtained at Lobamba Clinic to LAB-2.
In both laboratories, technologists centrifuged the EDTA tube to obtain two paired plasma specimens of 1 mL, which were stored in two separate sterile tubes at − 20°C before testing. As the reference method was located at the National Reference Laboratory (and collocated to Biocentric LAB-2), deep frozen plasma samples were shipped (2 h) from LAB-1 to LAB-2 for testing on the reference platform. All testing runs were performed with a plasma input volume of 250 μL on the Biocentric method and 1 mL on the Roche method. VL results that were discrepant between the two methods at LAB-1 and LAB-2 were repeated on the Biocentric method in the same laboratory when leftover plasma samples were available. The laboratory personnel were blinded to the results of both methods.

Statistical analysis
This study is reported according to the STARD guidelines [14]. Patients without a plasma test result on both platforms were removed from analysis. Baseline characteristics of the study population were described and summarized in frequency statistics and percentages. To compare baseline characteristics of patients by recruitment site, differences in continues (e.g. age) and categorical (e.g. sex) data were assessed with the Wilcoxon rank sum test and the Pearson's chi-squared test. We regarded the VL results from the reference method (CAP/CTM) as the national gold standard. Because the two assays had different lower and upper detection limits, VL test results were equalized at the common lowest (2.62 log 10 copies/mL) and highest reliable (7.0 log 10 copies/mL) detection limits. We assessed the correlation between the two methods graphically and with the Pearson's correlation coefficient for quantifiable VL values ≥2.62 log 10 copies/mL on the two platforms. Then we used Bland-Altman analysis to describe agreement between the two platforms by calculating the mean difference along with 95% limits of agreement [15]. Sensitivity and specificity were calculated using the threshold of 2.62 log 10 copies/mL (lower limit of detection, corresponding to 416 copies/mL) and 3.0 log 10 copies/mL (clinical threshold, corresponding to 1000 copies/mL). The positive and negative predictive values (PPV and NPV) were computed assuming 10 and 20% VL elevations in a hypothetical population undergoing VL testing. All analyses were conducted separately for each laboratory and both laboratories combined. In sensitivity analyses, to account for prolonged turnaround times from sample collection to freezing of paired plasma samples, diagnostic accuracy estimates (sensitivity, specificity) were recalculated for samples with processing times of ≤4.0 h. In addition, misclassified values were described separately at the patient level and accuracy estimates recalculated after re-quantification of discordant VL results. Discordance was defined as VL results which were categorized differently by the Biocentric platform (above or below) compared with the reference test, using a binary VL cut-off at 2,62 and 3.0 log 10 copies/mL. Because LAB-2 appeared to have had higher rates of misclassification, we evaluated a possible association between laboratory (LAB-2 vs LAB-1) and VL result misclassification. Potential confounding factors were identified a priori using directed acyclic graphs (DAGs) [16] and included in multivariable penalized maximum likelihood logistic regression models. All analyses were performed with STATA v14.1 (StataCorp, Texas, USA).

Diagnostic accuracy
Accuracy was calculated at two threshold levels (2.62 and 3.0 log 10 copies/mL), and findings are presented in For the threshold levels of 2.62 and 3.0 log 10 copies/mL, the overall (both laboratories combined) sensitivity was 85.7% (95% CI: 75.9 to 92.6) and 88.6% (78.7 to 94.9) respectively, and the specificity was 97.2% (94.6 to 98.8) and 98.3% (96.1 to 99.4). Although the specificity was similar in both laboratories, ranging from 96.2 to 99.1% at both threshold levels, the sensitivity was lower in LAB-2 at both log thresholds (at 2.62 log 10 copies/mL: 84.4%, 73.1 to 92.1) compared with LAB-1 (at 2.62 log 10 copies/mL: 92.3%, 64.0 to 99.8) ( Table 2). While the sensitivity at

Misclassification
At the threshold of 2.62 log 10 copies/mL, 19/364 (5.2%) samples were misclassified: 11/364 (3.0%) samples were misclassified downwards and 8/364 (2.2%) were misclassified upwards (Table 3). Among these, five samples were below the lower detection limit of the reference method but were detected on the Biocentric platform, and 11 samples were quantified on the reference method but not detected on the Biocentric platform. Misclassification occurred across all quantification levels of the reference method: five in the VL range of < 1.3 log 10 copies/ mL, eight in the range of 1.3-< 3.0 log 10 copies/mL, five in the range of 3.0-< 4.0 log 10 copies/mL, and one at ≥4.0 log 10 copies/mL. Of note, 57.9% (n = 11) of misclassification occurred in LAB-2, of which 10/11 (90.9%) were downward misclassifications. Overall, 18/ 19 (94.7%) discordant samples differed more than 0.5 log 10 copies/mL at the threshold of 2.62 log 10 copies/mL and 11/13 (84.6%) at the threshold of 3.0 log 10 copies/ mL.
After adjustment for potential factors associated with misclassification (see Additional file 1), multivariate analysis showed that LAB-2 had a 15.99 (95% CI: 2.26 to 113.27; p = 0.002) higher odds of downward   Due to rounding, the 3.00 log 10 copies/mL values represent a false-positive test result at the 3.0 log 10 copies/mL threshold but a concordant result according to the non-log 10 values

Discussion
Improved access to VL monitoring is crucial in RLS to meet the fast growing monitoring needs of large ART cohorts. One strategy is the deployment of multiple platforms by different stakeholders. This study is the first in Swaziland and, to our knowledge, the second internationally [13] to evaluate the utility of the Biocentric platform using plasma for VL quantification under routine conditions in comparison with another method. We showed that the Biocentric platform performs reliably under routine conditions. It had a strong positive correlation with the reference method (R = 0.81, p < 0.01), and the overall agreement between the two methods was high (mean difference − 0.03) at the 3.0 log threshold. Although 5.2% of samples were misclassified at the threshold of 2.62 log 10 copies, most discrepancies were resolved after re-quantification of discordant results, and the sensitivity and specificity increased to 97.1 and 99.0% at the 3.0 log 10 VL threshold. These estimates were similar to those reported previously, where the sensitivity and specificity were 100 and 90% respectively compared with the HIV Amplicor Monitor assay (Roche Diagnostics, Basel, Switzerland) [13]. Misclassification of results occurred across all quantification levels and most of them with an absolute difference of more than 0.5 log 10 copies/mL. This may indicate that misclassifications were due to factors beyond the technical variation of the platforms (e.g. operator differences). This study also showed inter-laboratory differences. Sensitivity was decreased in LAB-2, and LAB-2 emerged as an independent risk factor for downward misclassification (false negative) compared with LAB-1. Differences in quality between laboratories were likely due to manual sample preparation and reagent volume pipetting errors by staff who were less trained and experienced in this method. The Biocentric platform was newly established in LAB-2 and the training provided before the evaluation may have been insufficient. Disadvantages of this platform are that it is a manual technique requiring experienced staff, who cannot always be easily found or retained in RLS, and that manual techniques may be more prone to error [5,6]. Therefore, intra-and inter-laboratory quality assurance mechanisms should be established (in addition to the internal controls provided by the assay) to detect suboptimal performance as soon as possible. As a consequence, the National Reference Laboratory decided to provide further formal and hands-on training before the routine use of this platform in LAB-2. Of note, inter-laboratory differences independent of the VL assay and differences between platforms were also reported in other settings [17,18]. Because of the inherent variability between VL platforms, it is recommended that patients be monitored using the same technology platform to ensure correct interpretation of VL changes over time [19].

Context specific considerations
When VL testing is introduced into routine settings, viral (e.g. genetic diversity of HIV strains), programmatic, laboratory-specific and clinical (e.g. definitions of viral failure) factors need to be taken into account to establish a contextualized VL testing strategy. Firstly, a positive aspect of this platform is its ability to be implemented in RLS, performing reliably under routine conditions specifically at the clinical threshold level of 3.0 log 10 copies/mL. In our experience, maintenance requirements of this open platform are minimal and individual elements are interchangeable, such as RNA extraction techniques [20] and previously validated real-time PCR thermal cyclers [21]. Another positive factor is its high throughput volume. Four of the Biocentric-experienced laboratory technologists were able to perform up to three runs per day (246 tests per day) with four extractors and one thermal cycler under routine conditions. Secondly, the use of plasma for VL quantification limits its use to settings with strong sample transportation systems in place and/or the capacity to prepare and store samples at clinical sites. According to Biocentric, DBS samples can also be used on the platform, requiring less logistical and cold-chain support. VL quantification on DBS cards on Biocentric is being evaluated in Swaziland and will be reported in future. Thirdly, the Biocentric platform is a polyvalent technology, which allows testing of VL in conditions other than HIV, such as HIV early infant diagnosis (EID) and hepatitis C VL. This is becoming increasingly important for programmes wishing to integrate laboratory services using multi-disease platforms [22]. Fourthly, the Biocentric HIV VL test is priced competitively (ex-works USD14.9 per test) compared with other well-established VL technologies [23]. Finally, the Biocentric VL reagents, as with other VL technologies, contain guanidine thiocyanate (GTC), which is a toxic chemical compound [24] commonly used for the extraction of DNA and RNA in molecular tests [25]. As GTC can release cyanide gases in contact with bleach and due to its toxicity to aquatic life, it has to be managed as hazardous waste, normally through high-temperature incineration [25]. This can pose logistical challenges in RLS and requires proper planning and budgeting.

Limitations and strengths
A limitation of the study is that discrepant test results were not fully investigated. They were also not re-quantified on both methods owing to insufficient leftover plasma samples, with retesting being performed solely on the Biocentric platform. Although retesting of discordant results is not standard of practice in laboratory evaluation studies, retesting was performed to obtain additional information of the nature of discrepant results, assuming that the suboptimal performance of LAB-2 was likely due to less hands-on practice of the laboratory technologists rather than problems with the Biocentric method itself. After retesting, a few samples remained discrepant, for which several explanations exist. Firstly, there is the possibility of false test results on the national reference platforms due to internal quality issues or operator errors. However, internal and external quality control did not indicate quality issues during the study period. Nevertheless, a third VL assay should have been used to resolve discrepant results. Secondly, the two platforms used different plasma input volumes, increasing the likelihood of variations in measurements for values at the detection threshold. Thirdly, transportation and storage conditions may have affected the sample quality, possibly leading to a degradation of RNA. Lastly, we did not test for HIV genotypic diversity. VL assays differ in their ability to quantify genetically diverse HIV strains, largely depending on the design of primers and probes [13,[26][27][28][29]. The CAP/CTM HIV-1 v2.0 detects HIV-1 groups M, N and O, and Biocentric detects HIV-1 group M (A-H) [28]. Without a panel of samples with genetic diversity, generalizability is limited, specifically to settings where other strains are endemic. However, according to a recent study in Cameroon, Biocentric performed well in that setting which is characterized by broad HIV genetic variability [30]. Another limitation is that the majority of VL samples were below the detection limit of the Biocentric platform, reducing the sample size for correlation and Bland-Altman analyses. Finally, we did not assess reproducibility. This study focused on field diagnostic accuracy and is not a pure analytical study. Repeat testing would have been complex to undertake at various conditions (intra and inter-variability) because it would have required more VL samples from patients.
A strength of the study was its conduct under routine real-world conditions; therefore, challenges and constraints are comparable to other RLS in Sub-Saharan Africa. Also, the personnel involved from sample collection (phlebotomist) to VL testing (laboratory technologists) are likely to reflect staff composition of other RLS.

Conclusions
The Biocentric platform using plasma for VL quantification showed results that were comparable overall to the national reference method. This study also revealed inter-laboratory differences in performance, which was likely due to unmet training needs and lack of hands-on practice of technologists in one laboratory, highlighting the need for continuous training of laboratory personnel. In addition to participation in national and international proficiency testing programmes, routine quality control methods should be integrated into laboratories performing at high scale in RLS to detect suboptimal performance as soon as possible. The Biocentric platform is now routinely used in Swaziland to support the expansion of VL testing.