Machine learning pipeline for blood culture outcome prediction using Sysmex XN-2000 blood sample results in Western Australia

Background Bloodstream infections (BSIs) are a significant burden on the global population and represent a key area of focus in the hospital environment. Blood culture (BC) testing is the standard diagnostic test utilised to confirm the presence of a BSI. However, current BC testing practices result in low positive yields and overuse of the diagnostic test. Diagnostic stewardship research regarding BC testing is increasing, and becoming more important to reduce unnecessary resource expenditure and antimicrobial use, especially as antimicrobial resistance continues to rise. This study aims to establish a machine learning (ML) pipeline for BC outcome prediction using data obtained from routinely analysed blood samples, including complete blood count (CBC), white blood cell differential (DIFF), and cell population data (CPD) produced by Sysmex XN-2000 analysers. Methods ML models were trained using retrospective data produced between 2018 and 2019, from patients at Sir Charles Gairdner hospital, Nedlands, Western Australia, and processed at Pathwest Laboratory Medicine, Nedlands. Trained ML models were evaluated using stratified 10-fold cross validation. Results Two ML models, an XGBoost model using CBC/DIFF/CPD features with boruta feature selection (BFS) , and a random forest model trained using CBC/DIFF features with BFS were selected for further validation after obtaining AUC scores of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.76 \pm 0.04$$\end{document}0.76±0.04 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.75 \pm 0.04$$\end{document}0.75±0.04 respectively using stratified 10-fold cross validation. The XGBoost model obtained an AUC score of 0.76 on a internal validation set. The random forest model obtained AUC scores of 0.82 and 0.76 on internal and external validation datasets respectively. Conclusions We have demonstrated the utility of using an ML pipeline combined with CBC/DIFF, and CBC/DIFF/CPD feature spaces for BC outcome prediction. This builds on the growing body of research in the area of BC outcome prediction, and provides opportunity for further research. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-023-08535-y.


Introduction
Bloodstream infections (BSIs) are becoming an increasingly significant burden on the global population.At the local level, BSIs have significant costs to healthcare systems and patients.This is represented by both the economic impact as a result of diagnosis and treatment, and the damage to patients as a result of a BSI.Untreated BSIs can lead to serious health consequences.Sepsis, which is currently defined as a life threatening organ dysfunction due to a dysregulated immune response to infection [1], is one potential result of a BSI.BSIs are the result of infections with pathogenic organisms including bacteria and fungi.The detection of a BSI requires blood culture (BC) testing to identify infections in the bloodstream.The test uses a blood sample from the patient, placed in a medium to promote growth of microorganisms.This is incubated in the laboratory and observed for growth.BC testing is considered the current "gold standard" for diagnosis of BSIs, however, BC testing is generally overused and results in low positive yields [2,3].This can lead to longer hospital stays, additional unnecessary patient tests, increased costs and resource expenditure, and the unnecessary application of antimicrobials [3][4][5][6].This, in turn, contributes to the proliferation of antimicrobial resistance (AMR), an increasing burden on the global population with an estimated 1 • 27 million (0 • 911-1 • 71) deaths directly attributable to drug resistance in 2019 [7].Implementing diagnostic stewardship regarding BC tests has therefore become a significant clinical priority.The aim of diagnostic stewardship is to "select the right test for the right patient, generating accurate, clinically relevant results at the right time to optimally influence clinical care and to conserve health care resources" [8].In the case of BC testing, it is important to identify when BC tests are unnecessary, in order to support clinicians deciding whether to order BCs [9].With the increasing amount of data being produced and stored in the clinical laboratory environment, machine learning (ML) algorithms can be utilised for diagnostic stewardship of BSIs.ML solutions are increasingly applied for problems in infection science.In the hospital, ML models are used to assist in the patient diagnosis, treatment, and management; and in the clinical laboratory, ML is providing solutions for problems relating to laboratory workflows and testing methodologies.In particular, the analysis of large, multidimensional datasets that are difficult for humans to analyse provides the opportunity for ML based approaches.This paper introduces a ML pipeline for BC outcome prediction using blood sample data produced by Sysmex-XN 2000 hematology analysers (Sysmex, Kobe, Japan).The ML models within this pipeline have been trained on retrospective data, in addition to being validated on retrospectively collected, internal, and external datasets.The purpose of this pipeline is to reduce the number of unnecessary BCs, and improve diagnostic stewardship practices of BC testing.

Machine learning lifecycle
We present a ML pipeline for BC outcome prediction which includes data processing, and model development and evaluation.Each of these components are discussed in following sections.

Data collection and processing
We trained ML models using complete blood count (CBC), white blood cell differential (DIFF), and cell population data (CPD) produced by the Sysmex XN-2000 hematology analysers.CBC and DIFF features are routinely reported in the laboratory environment, while CPD features are not routinely reported, as they are currently only used for research purposes.Three separate datasets were utilised, including training, internal validation, and external validation datasets, all obtained retrospectively.Properties of these datasets are discussed in the following sections.The ML model development process is discussed in the section Machine learning model development.All data was produced between 1 January 2018 and 31 May 2020.CBC, DIFF, and CPD test results were joined with respective microbiological outcome data from the laboratory information system (LIS).Test results and corresponding BC outcomes were included if the blood samples for CBC and BC testing were taken at the same time, therefore sharing a sample identification number.Imputation of missing values was not required as all features that were included during the training phase were complete when tests were performed.Data used throughout this study was managed appropriately based on local research procedures and guidelines.All data was provided in a de-identified form, and additional demographic or clinical outcome data from patients was not used.These datasets have been previously utilised in unpublished research [10].The datasets are described in the following section, and in Table 1.Only samples from adult populations (age > 18) were included, and samples were excluded if the CBC test did not have a corresponding BC test with matching sample identification.Samples were also excluded if errors were present during CBC data generation.These samples were automatically flagged by the analyser.We were unable to determine which organisms were clinically significant or contaminated.Therefore, based on a previous study by Nannan Panday et al. [11], we considered Micrococcus species, Bacillus species, Coagulase-negative staphylococci (CoNS), Corynebacterium species, and Propionibacterium acnes as non-significant/contamination.CBC data which had a corresponding BC result with these microorganisms were not considered in our study.This was done to reduce the risk of including incorrectly labelled data into the training dataset.

Retrospective training dataset
The retrospective training dataset includes results produced between 1 January 2018 and 31 December 2019.
Data was generated at Pathwest Laboratory Medicine, Nedlands, Western Australia from patients at Sir Charles Gairdner Hospital (SCGH), a teaching hospital in Nedlands, Western Australia.The training set contains 10965 samples.10134 of these blood samples were drawn with negative BC results (92.42%), and 831 were drawn with positive BC results (7.58%).

Retrospective internal validation dataset
The retrospective internal validation dataset includes results produced between 1 January 2020 and 31 May 2020.Data was generated at Pathwest Laboratory Medicine, Nedlands, Western Australia from patients at SCGH.This set contains 318 samples.292 of these blood samples were drawn with negative BC results (91.82%), and 26 were drawn with positive BC results (8.18%).

Retrospective external validation dataset
The retrospective external validation dataset includes results produced between 1 January 2020 and 31 May 2020.Data was generated at Pathwest Laboratory Medicine centres in Western Australia outside of the Pathwest Laboratory Medicine, Nedlands centre.Data was extracted from the LIS.This set contains 1245 samples.1138 of these blood samples were drawn with negative BC results (91.41%), and 107 were drawn with positive BC results (8.59%).For this dataset, a model trained on CBC and DIFF data was evaluated due to the inability to obtain CPD from other centres.

Interpretation of features
Hematology data produced by the Sysmex XN-2000 module analysers was used as the input for the ML models, including CBC, DIFF, and CPD features.A CBC is a regularly requested laboratory test that is used to analyse patient blood samples and reports information regarding the cells in the blood including white blood cells/leukocytes (WBC), platelets/thrombocytes (PLT), and red blood cells/erythrocytes (RBC).In addition to a standard CBC, a DIFF which provides information about the different WBC types is also often performed.This includes analysis of neutrophils (NEUT), lymphocytes (LYMPH), monocytes (MONO), basophils (BASO), and eosinophils (EO).From DIFF information, it is also possible to derive additional features including neutrophil-to-lymphocyte ratio (NLR), and monocyte-to-lymphocyte ratio (MLR).CPD features are produced as a result of the fluorescent flow cytometry method used by the Sysmex analysers.CPD provides numerical values for side scatter light (SSC), foward scatter light (FSC), and fluorescent light intensity (SFL) .These values are often presented graphically on a scattergram along the x-axis, z-axis, and y-axis respectively.SSC represents cellular granularity, FSC represents cell volume and shape, and SFL represents the nucleic acid and protein content of cells [12,13].Lastly, the Sysmex XN-2000 also generates interpretive program messages (IP flags) based on the outcome of a CBC analysis, and provides warnings for hematological conditions or disorders [14].The analysers produce these flags for WBC, RBC, and PLT.

Feature spaces
Two feature spaces were created and used to train ML models.The CBC and DIFF feature space (CBC/DIFF), and the CBC/DIFF feature space with the addition of CPD (CBC/DIFF/CPD).Separate models were trained on each of these feature spaces with a ML model development pipeline including feature selection and stratified 10-fold cross validation.The CBC and DIFF, and CPD features are shown in Tables 2 and 3 respectively.NLR and MLR are included as part of the CBC and DIFF features.

Machine learning model development
Three different tree-based methods were evaluated; random forests (RF) [15], decision trees (DT) [16], and (1) XGBoost (extreme gradient boosting) [17].Only treebased models were explored in this study as they provide the feature importance property after training the models.As the data is highly imbalanced, class weighting was implemented to manage this imbalance.The models were trained on each of the feature spaces, CBC/DIFF/ CPD and CBC/DIFF.For each model and feature space, a feature selection method was selected.The methods include none (all features in the space included), recursive feature elimination (RFE) until 5 features, and the boruta feature selection method [18].The boruta method was evaluated due to the effectiveness of the approach in previous studies in the medical domain [19][20][21][22][23].The boruta method utilised RF and XGBoost models respectively when they were being trained.However, when training the DT models, RF was used with boruta to perform feature selection before training.This approach of using boruta with DT models has been previously implemented [24].Stratified 10-fold cross validation of the training set was used to determine which models would be selected for further validation.The purpose of this study was to produce baseline ML models for BC outcome prediction.Given this objective, hyperparameter optimisation was not utilised due to the process being computationally expensive.

Machine learning model evaluation
Models were evaluated using several metrics including area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and the J-statistic.These metrics were calculated for stratified 10-fold cross validation during model training; and validation on the internal and external datasets.Metrics are for models when the classification threshold is at 0.5 unless otherwise stated.

Model training and cross validation
Results for the ML models after stratified 10-fold cross validation were sorted based on mean AUC, followed by the mean J statistic value, mean recall value, and mean diagnostic odds ratio at a classification threshold of 0.5.All of the ML models, feature selection methods, and class weight combinations performed similarly on stratified 10-fold cross validation.The lowest and highest AUC scores obtained were 0.70 ± 0.05 and 0.76 ± 0.04 respectively.Two models were    5 and 6.

Model validation: internal dataset
The XG/CBC/DIFF/CPD/1.5/boruta and RF/CBC/ DIFF/1/boruta models were evaluated on the internal validation set.The models achieved AUC scores of 0.76 and 0.82 respectively.AUC curves for these models are shown in Fig.  classification threshold of 0.3, the models achieved sensitivity scores of 0.96 and 1.0, and specificity scores of 0.31 and 0.15 respectively (Additional file 1, Figs. 5 and 6 for confusion matrices).

Model validation: external dataset
The RF/CBC/DIFF/1/boruta model was evaluated on the external validation dataset as CPD parameters were unavailable.The model achieved an AUC score of 0.76.The AUC curve is shown in Fig. 3.At the classification threshold of 0.5, the model achieved sensitivity and specificity scores of 0.62, 0.70 respectively (Additional file 1, Fig. 7 for confusion matrix).At the classification threshold of 0.4, the model achieved sensitivity and specificity scores of 0.87 and 0.54 respectively (Additional file 1, Fig. 8 for confusion matrix).At the classification threshold of 0.3, the model achieved sensitivity and specificity scores of 0.99, 0.24 respectively (Additional file 1, Fig. 9 for confusion matrix).

Discussion
The ML pipeline established is this study performed consistently on stratified 10-fold cross validation, internal, and external validation datasets utilising CBC, DIFF, and CPD features produced by the Sysmex XN-2000 analysers.The pipeline is positioned to be validated in prospective studies for BC outcome prediction on patients who have BC and CBC samples drawn at the same time.This work adds to the existing body of literature, and presents, at the time of writing, the first use of CBC, DIFF, and CPD with ML for BC outcome prediction for the purpose of reducing the number of unnecessary BC tests.These results highlight the use of this approach for improvements in diagnostic stewardship by reducing the number of unnecessary BCs that are processed after BC tests have been requested by clinicians.All trained models demonstrated similar performance across all of the datasets.The XG/CBC/DIFF/CPD/1.5/boruta achieved an AUC score of 0.76 ± 0.04 on stratified 10-fold cross validation, and an   BC test should be performed [39].In the proposed pipeline, only the results of routine blood tests are considered.A benefit of using only hematological data is that it simplifies the clinical integration process as the ML models do not rely on the production of data from multiple sources.Using a single source of data provides a simplified workflow for analysis and subsequent reduction in difficulty to integrate the approach within clinical laboratory workflows.Therefore, other features such as physiological, and biochemical features have been purposefully excluded from this study.A proposed clinical integration workflow is shown in Fig. 4, positioned between the physician and the laboratory, after blood tests have been performed.
Restricting the pipeline from using other, non-routinely collected data means that the proposed ML workflow from training, testing, and deployment, can be introduced more broadly as demonstrated by the performance of the pipeline on externally collected data.This study has limitations.Firstly, we utilised data produced from the Sysmex XN-2000 modules and did not take into consideration other information regarding the patient.We also focused on the entire hospital population.ML models may perform better when trained exclusively for certain patient sub populations.We have limited this study to focusing on data processing, model development, and model evaluation.Therefore we have not included discussion on methods of interpretability and explainabilty, and leave this open for future research.Deployment and integration strategies were not investigated and should be the focus of future work, along with evaluation of the ML pipeline in prospective studies.Furthermore, alternative feature selection methods, hyperparameter optimisation, Fig. 4 A potential clinical integration workflow for the proposed BC outcome prediction ML model and additional ML methods should explored.Lastly, future work should aim to address the limitations surrounding the identification of clinically significant microorganisms and use a different method than the literature based approach we have chosen in this study.
Width of dispersion of Monocytes size subsequently selected for further evaluation.The first, which used the CBC/DIFF/CPD feature space was the XGBoost model with 1.5 class weights and utilising boruta for feature selection (XG/CBC/DIFF/CPD/1.5/boruta).This was selected as it was the best performing model when sorted accordingly.This represented a model which was balanced, with the possibility of adjusting thresholds for prediction.For external validation where CPD parameters were unavailable, the RF model with CBC and DIFF parameters was selected with balanced class weights and the boruta feature selection method (RF/CBC/DIFF/1/boruta).Table4shows the performance of these two models for stratified 10-fold cross validation during model training.Additional file 2 contains results for all models evaluated during the model training and cross validation stage.The features used in the XG/CBC/DIFF/ CPD/1.5/boruta and RF/CBC/DIFF/1/boruta models are shown in Fig. 1.All feature importance's for both models are shown in Tables

Fig. 2 Fig. 3
Fig. 2 curve for the XG/CBC/DIFF/CPD/1.5/boruta and RF/CBC/DIFF/1/boruta models when tested on the internal validation dataset.A positive prediction represents a positive blood culture outcome

Table 1
Description and properties for each dataset

Table 2
Complete blood count (CBC) and differential (DIFF) features

Table 3
Cell population data (CPD) features

Table 4
Performance of ML models for stratified 10-fold cross validation.Showing area under the receiver operating characteristic curve (AUC), J-statistic (J stat), sensitivity, and specificity at a classification threshold of 0.5