A machine learning-based system for detecting leishmaniasis in microscopic images

Background Leishmaniasis, a disease caused by a protozoan, causes numerous deaths in humans each year. After malaria, leishmaniasis is known to be the deadliest parasitic disease globally. Direct visual detection of leishmania parasite through microscopy is the frequent method for diagnosis of this disease. However, this method is time-consuming and subject to errors. This study was aimed to develop an artificial intelligence-based algorithm for automatic diagnosis of leishmaniasis. Methods We used the Viola-Jones algorithm to develop a leishmania parasite detection system. The algorithm includes three procedures: feature extraction, integral image creation, and classification. Haar-like features are used as features. An integral image was used to represent an abstract of the image that significantly speeds up the algorithm. The adaBoost technique was used to select the discriminate features and to train the classifier. Results A 65% recall and 50% precision was concluded in the detection of macrophages infected with the leishmania parasite. Also, these numbers were 52% and 71%, respectively, related to amastigotes outside of macrophages. Conclusion The developed system is accurate, fast, easy to use, and cost-effective. Therefore, artificial intelligence might be used as an alternative for the current leishmanial diagnosis methods.


Background
Leishmaniasis, a disease caused by more than 20 species of leishmania parasites, is recognized in the tropical and subtropical regions as an acute disease with a high mortality rate. The disease manifests itself in both cutaneous and visceral forms and is transmitted via parasiteinfected mosquitoes [1,2]. Cutaneous Leishmaniasis (CL) is endemic in more than 88 countries and around two-third of the cases occur in Afghanistan, Algeria, Brazil, Pakistan, Peru, Saudi Arabia, Iran, and Syria [3,4]. Annually, CL is estimated to cause 1 million new cases [5], with limited responsed in treatment and management [6][7][8][9][10].
The clinical symptoms of CL vary depending on the species of the parasite, but the disease, in general, begins with a papule or nodule, reaching its final size in about a week. Its center contains a shell that may break apart and show a wound that will heal slowly over months or years [11]. However, an estimated 10% of CL cases become chronic and progress into severe symptoms [12]. Considering the wide clinical spectrum of CL, certain diseases are likely to have similar clinical manifestations (e.g. dermatitis, squamous cell carcinoma, tuberculosis, skin mycosis). Therefore, to differentiate CL from its clinical and histologic look-alikes, additional diagnostic measures need to be taken [12].
Even today, parasitological diagnosis of CL remains the gold standard due to its high specificity [13]. The process includes a microscopic examination of biopsies with Giemsa stained or aspirates, histological examination of fixed lesion biopsies, culture, triturates, or aspirates biopsy [14]. Currently, microscopic examination is probably the most common diagnostic method because it is less expensive and available at the level of primary, secondary, and tertiary healthcare. Among these methods are protein A (ProtA), immunoglobulin (Ig)G2, lymphocyte proliferation assay, indirect fluorescent antibody test (IFAT), the quantitative real-time polymerase chain reaction of bone marrow (qPCR-BM), qPCR-Blood, and IgG [15]. Cultivation is another diagnostic method that provides useful information for the identification and description of species, but it is time-consuming and requires expenditures and technical expertise. Moreover, the sensitivity of this method is quite low [16].
Molecular parasitic diagnosis of CL has been extensively developed and reviewed over the past decade [17]. Diagnosis is mainly performed by PCR-based methods and is particularly useful in cases of low parasitic multiplicity (e.g., mucosal Leishmaniasis). Moreover, the treatment of CL patients can be controlled and followed up by this method. The specificity of this technique is 100, but its sensitivity is around 20% to 30% in CL and 55% to 70% in mucosal leishmaniasis which is low compared to conventional parasite detection methods. Several efforts have been made to improve the performance of molecular parasitic diagnosis of CL such as the successful discovery of parasitic DNA in blood or tissue stains; development of rapid PCR oligo-chromatography; however, its applications are still limited because this method is expensive and it also needs considerable laboratories' infrastructure and technical expertise [17].
In this paper, we developed an artificial intelligence (AI)-based system to assist with detecting and diagnosing leishmania parasites. Details of the method and its evaluation results using several real images are presented.

Methods
We used 300 images taken from 50 laboratory slides acquired from lesions suspected of leishmaniasis and from patients referred to Valfajr Clinic in Shiraz, Fars, Iran. These images include 150 photos from 25 positive leishmania slides and 150 photos from 25 negative Leishmania slides (control). The slides were prepared and labeled by taking samples from inflamed edges of the wounds using a sterile scalpel and smeared on slides, followed by 100% ethanol fixation and Giemsa staining.
According to the morphological data acquired by assessing the slides, the Viola-Jones algorithm was used to design an intelligent system capable of detecting infection in the collected smears. Briefly, the detector should be provided with images of both parasitic and non-parasitic samples so that it can gradually learn their distinctive features and become able to spot infected regions in an unseen image. Viola-Jones algorithm acts in the following steps: feature extraction, integral image creation, and classification [18].
For feature extraction, the sum of the pixels within the white rectangles is subtracted from the sum of pixels in the grey rectangles ( Fig. 1). The result is used as features to represent subsections of an image. Intuitively, these rectangle features are inspired by Haar wavelets which are simply square functions with various scales and translations.
Integral image creation was used to increase the processing speed and the number of features., The images used contain irrelevant parts. Moreover, an abundant number of Haar-like features should be computed. Particularly, 162,336 features were computed for a 24 × 24 pixel image window. To resolve this issue, we used the image integration technique by which the intensity of each pixel at x, y is the sum of all the pixels that reside above and to the left side.
By categorizing the subsections using Haar-like features, we can create integral images. The reason behind the categorization is to eliminate unwanted sections of our image and shorten processing time. To compute the sum of the pixel values in the subsections, array references are used. A single-rectangle sub-window needs four array references, while two, three, and four adjacent rectangle sub-windows need six, eight, and nine references, respectively. In an integral image of size R × C , the main integral image ii(R, C) is produced during single processing of the sum of the pixel Fig. 1 Rectangular window detection of Haar-like feature [19] values above and to the left of (R, C) . Once the integral image representation ii of the original image I is computed, the sum of original pixel values within any rectangle can be computed by a lookup table. Therefore, as shown in Fig. 2, to compute the sum of pixel values in subsection S 1 , (r 1 , c 1 ) is needed and is computed as mentioned below: whereas to compute the values of subsection S 4 reference arrays (r 1 , c 1 ), (r 2 , c 2 ), (r 3 , c 3 ) and (r 4 , c 4 ) are needed.
Therefore, aside from creating integral images, certain learning algorithms are employed to select the best features and to train classifiers. Adaboost, currently the most popular boosting method, acts by adding weak learners to a boosted classifier one by one. This way, each new classifier is trained using a new set of information. The resulting classifiers are integrated with a cascade scheme (Fig. 3).
Cascading is a stage-by-stage process, each stage consisting of a particular classifier with certain features. While all the features are grouped in these stages, the purpose of each stage is to determine whether a particular sub-window is not a match with the desired result or it may be a match; the desired result being the recognition of the previously defined morphological data. If a sub-window fails to find a match in any of the stages, it is discarded immediately. Therefore, usually, a classifier consisting of only a few simple and general features is used in the first stage/stages to rapidly remove unwanted subjects, granting more computational time to further stages requiring deeper analysis. Alternatively, a cascade of gradually more complex classifiers can achieve better detection rates, at the high cost of run-time speed, making it inefficient to do so. The sensitivity threshold can be adjusted in a cascade, preventing each stage from having a lower detection rate than the specified threshold. The total sensitivity will be the product of stage sensitivities. Ultimately, cascading classifiers enable the detection of the desired object in an image to be gradually approximated and a robust classifier is developed. Viola-Jones algorithm results in a drastic improvement of accuracy and execution time. In a given dataset x 1 , y 1 , x 2 , y 2 , . . . , x n , y n , y i is selected as 0 for negative cases and y i = 1 for positives.
I. For, y i = 0 1 weights are considered to be w 1,j = 1 2m , 1 2l , respectively; where m is the number of negative cases and l the number of positives. II. For t = 1, …, T: Fig. 2 The complete expression of the integral image; the sum of the pixels inside sub-window S 1 using the relation σ (S 1 ) = ii (p 1 ) = ii (r 1 , c 1 ) is obtained and also the sub-windows S 2 and S 3 with the relations σ (S 2 ) = ii (r 2 , c 2 ) − (r 1 , c 1 ) and σ (S 3 ) = ii (r 3 , c 3 ) − ii (r 1 , c 1 ) are expressed. The pixels below S 4 are also calculated as σ ( c) The classifier with the lowest ∈ t error, h t , is chosen. d) The weights are updated as: , and e i = 0,1 for correct and incorrect classification of x i , respectively. III. The strong classifier ensembled from single weak classifiers is as follows:

Results
The performance of the developed system was quantitatively assessed using the two evaluation metrics sensitivity and specificity. Sensitivity, the probability of a positive test outcome in an infected patient, is calculated as shown below: Therefore, the fewer the number of false negatives, the higher the sensitivity will be. In this study, this rate was computed to be 50% and 71% for infected macrophages and amastigotes outside of macrophages, respectively (Fig. 4).
Additionally, the chance of a negative test in a healthy patient, known as specificity, can be calculated similarly; Meaning that a low count of false positives increases the likelihood of the method is precise. Specificity in the detection of Leishmanial infected macrophages was shown to be 65%, while it was 52% for individual parasites (Fig. 4).
Overall, when the output of the infected macrophagesbased system and individual parasites-based system were combined using OR combiner, the system provided a sensitivity and specificity of 83% and 35% in parasite detection, respectively.

Discussion
In recent years, many methods have been suggested to diagnose the leishmanial parasite [21]. Each method was successful in several aspects, but there are several disadvantages associated with each method. Direct visual recognition using a microscope is a simple and cost-efficient method for parasite detection; however, it depends on the skillfulness of the expert and its sensitivity rate  Fig. 4 The accuracy, sensitivity, and specificity and of leishmania detection system, both in and outside of macrophages is relatively low [22]. Culture use, as another method for parasite detection, requires its own set of tools and expenses, and the probability of infection with other microbial organisms during the process might negatively affect the results [23]. Serological tests such as IFA and ELISA face the same issue as they cannot differentiate past and present infections. Additionally, serological tests, due to low antibody titers of the leishmanial parasite, do not offer much diagnostic value [24]. Early diagnosis of deadly diseases, such as leishmaniasis, results in an earlier treatment/control which can influence mortality rates significantly. Presently, PCR is known as the method presenting the highest sensitivity and specificity rates. Aviles et al. reported 92% sensitivity and 100% specificity in cutaneous leishmaniasis detection [24]. Similar results were obtained in many other studies [25][26][27][28]. However, in chronic cases, PCR sensitivity drops significantly (45.5%) [25]. In addition, PCR is a complex, expensive, and time-consuming procedure requiring certain devices. In this work, we examined the efficiency of artificial intelligence to detect leishmaniasis. Fortunately, the results were promising. The proposed system provided the sensitivity and specificities of 35% and 83% in detecting CL. Many machine learning methods have been developed over the years which can help learning methods and diagnostic systems work more efficiently [29]. Adaboost, decision tree, KNN, linear regression, Naïve Bayes, Random Forest, and Extra tress are some of these methods. Saiprasath G et al. compared these 7 methods in an automated microscopic malaria detection procedure. The two methods, Random Forest and Adaboost proved to be more capable of generating desired results in terms of accuracy, sensitivity, specificity, and F1-score [30].
High recall rates are equivalent to a smaller count of false negatives. In deadly diseases such as Leishmaniasis, this percentage matters since infected patients should not be left unrecognized with the possibility of incorrectly being assumed healthy. Thus, necessary and deserved care and treatment can be provided, resulting in a lower morbidity and mortality rate. On the other hand, a high precision percentage indicates a low number of false positives. In some situations, the inadequacy of resources could prevent health experts from giving patients the care they need. A high number of false positives in a method could lead to unnecessary expenditure of resources and equipment and an increase in total expenses.
Bearing in mind the mentioned strengths and advantages of using intelligent diagnostic systems, keeping a heads-up in certain situations can help prevent any loss of efficiency. For example, if images acquired for the system contain low resolutions or have numerous dark parts (increased pixel count), the classification process would take more time, with the possibility of a greater number of false positives, thus overall efficacy drops. Moreover, these programs might need updates from time to time [31]. Thung et al. introduced Speeded-Up Robust Features (SURF)to develop an efficient method for automated detection of parasites. This procedure uses only images, without any need for learning and/ or boosting algorithms. Unfortunately, the outcome was unsatisfactory [31]. Several procedures have been shown to perform based on Image Segmentation [32]. K-means clustering [33] and U-Net architecture are examples of the techniques used in this process. Górriz M et al. achieved promising results using U-Net architecture for Leishmanial parasite detection [33]. However, this method is quite time-consuming (15 h required by an NVIDIA GTX Titan x GPU) [33]. Nevertheless, this procedure can be performed considerably faster using integral image creation and boosting methods such as Adaboost [20].

Conclusion
In this study, we proposed an AI-based system for cutaneous leishmaniasis detection. For this purpose, the Viola-Jones object detection algorithm enhanced by the Adaboost method was used. The system provided a fairly high sensitivity rate (83%), and moderate specificity rates. In addition, the algorithm is fast and easy to use. Overall, the results are promising and show that AI techniques can assist with diagnosing and treatment of leishmaniasis.