One of the most important parts of designing a nucleic acid amplification test (NAAT) such as PCR, is having a completely specific and long enough target. Specificity of the target would eliminate the false positive results. The long sequence would help the researcher to design efficient and sensitive primers. In this study, genome comparison method with few modifications was used to achieve completely specific and long sequence (1). In this method, using bioinformatic facilities provided by the NCBI database, genomic sequences of 7 members of MTBC were compared with complete genomes available on the database (up to 2018.08.20) and a 5000 bp sequence with high specificity was identified.
Based on in silico studies, the 5KST can specifically detect the most of NCBI-registered complete genomes of MTBC members including: 237 strains of M. tuberculosis, 12 strains of M. bovis BCG, 8 strains of M. bovis, 3 strains of M. africanum, 2 strains of M. canettii, 1 strain of M. caprae and 1 strain of M. microti.
Furthermore, the investigation of 5KST presence in 304 strains of M. tuberculosis from different parts of the world (TB-ARC project) showed that this sequence is conserved among all these strains (Additional file 1: Table S2). It should be noted that few of these 304 strains had the query cover of less than 100% (about 81–85%). After evaluating the sequences of these strains, we found that the query cover of less than 100% was not due to the less similarity to our sequence, but also it resulted from the gaps of the sequence during the sequencing process that recorded NNN’s instead of G, C, A, T bases.
Tuberculosis is a disease which has caused millions of deaths so far. Despite the advances in human science, it still puts millions of people at the risk of death every year. Therefore, extensive studies are being conducted in many parts of the world to control it in prevention, diagnosis and treatment parts [2]. The aim of the present study was to improve the diagnostic part of TB. If the diagnosis is timely, TB can be controlled and treated with minimal side effects when the disease is not advanced [25].
Three tests that are routinely used in most TB laboratories include acid-fast bacilli microscopy, culture and PCR [26]. PCR is fast, sensitive and highly specific. In most studies on clinical specimens, it showed higher sensitivity than acid-fast bacilli microscopy, but lower or equal sensitivity than culture. PCR acts more specific than the two other tests [26,27,28].
The most important factors that reduce the sensitivity of the PCR test on clinical specimens include 1. Presence of inhibitors in clinical specimens 2. Partial loss of DNA during purification 3. Chemical inhibitors residue from the purification process 4. Poorly designed and inefficient primers [16, 28, 29].
In this study, to prevent the negative effects of inhibitors in clinical specimens, only 1 μL of the specimen was used in 25 μL of reaction. The results of this study, previous experiences of our team in molecular laboratory, as well as reviewing the studies of other researchers, showed that the sensitivity of PCR test is lower when using clinical specimens (either natural or artificial) than pure DNA in the reaction. We don’t know exactly which inhibitor exists, but generally the substances of clinical specimens, especially sputum, cause the reduction in the sensitivity of test. As JE Clarridge et al. (1993), in a large study to evaluate the PCR test on clinical specimens, showed that the substances of clinical specimens, especially sputum, could cause up to 20% false-negative results and reduce the sensitivity [14]. In another study on sputum specimens, FS Nolte et al. reported that the sputum could produce 10 to 17% false-negative [12]. In a study on 76 respiratory specimens in Turkey, the inhibition rate of up to 26% was observed in the real-time PCR test and caused false-negative results [13]. Also in our study, the use of ≥2 μl clinical specimen in the 25 μl 5KST-PCR reaction worsened the sensitivity from 5 fg to 10 fg.
Also, to prevent DNA loss during purification, this step was removed and only autoclave extraction with few modifications was performed. Furthermore, since the 5KST sequence is long enough, we were able to design efficient primers with the best probable condition of not producing hairpin and dimer structures.
Our studies with blastn showed that all the important targets that have been used so far, have short length or contain long nonspecific regions. Unlike other targets, the 5KST target in addition to being long enough (5000 bp), does not have any statistically significant relationship to NTM bacteria. Some of the targets that have been used more than others or had better detection limit include: rpoB, IS6110, devR, mpb64, sdaA.
IS6110 has been used as one of the most widely used diagnostic targets for TB [8]. This 1361 bp sequence, is repeated multiple times in the genome of M. tuberculosis and thus is regarded as a sensitive diagnostic target. Up to now, various PCR tests are designed based on this sequence, and the reported detection limit usually equals 2 copy of M. tuberculosis genomic DNA per μl [18, 30,31,32]. Various studies showed that the PCR tests based on this target had a clinical sensitivity of 63–98% and clinical specificity of 82.1–100%. [18, 33,34,35,36] This sequence has limitations despite the high sensitivity. For example, some strains of M. tuberculosis have been found which lack the IS6110 sequence [17]. Furthermore, our in silico analysis with blastn showed that large fragments of this sequence have similarities with other NTM bacteria (M. rutilum, M. smegmatis, M. chimaera) and some Nocardia species such as N. brasiliensis. The 5KST sequence has only one copy in the genome of MTBC, nevertheless, it acts completely specific. In addition, unlike the IS6110, the 5KST is quite long which allows to design highly effective primers with minimal hairpin and dimer structures. Although the result of poorly designed primers may not appear in the test sensitivity of pure genomic DNA, it would quietly affect the clinical specimens and reduce the sensitivity [15].
Another diagnostic target sequence which many studies are based on, is mpb64 [28, 37, 38]. This sequence has 687 bp length [39]. Our in silico study by blastn showed that it also contains large regions with high similarity to NTM species such as M. kansasii, M. ulcerans, and M. hemophilum. These nonspecific regions as well as the short length of sequence, make it very difficult to design efficient primers. As in most cases, the sensitivity reported for mpb64 primers is lower than the IS6110 primers [35]. An analytical sensitivity that eventually reported for 5KST-PCR was 1 copy per μl, which according to previous studies is better than mpb64 -PCR (20 copies per μl) [18]. Also, mpb64-PCR had 88–91% specificity and 48–91% sensitivity on clinical specimens [18, 34, 35, 37].
rpoB diagnostic target has also been used to detect rifampicin-resistant TB in several studies [40]. Sensitivity and specificity of the PCR assays based on this target gene have been reported in two clinical trials about 93.3–95.8 and 100% respectively [41, 42]. However, our in silico analysis showed that this sequence has nonspecific similarities with some other mycobacteria. As the analytical specificity of this target in another study showed that the common primers designed for this sequence can also detect M. chelonae, M. kansasii, M. scrofulaceum, M. smegmatis and M. szulgai nonspecifically [18].
The devR sequence is another diagnostic target. In various studies, the detection limit of 200–500 copy per μl has been reported. This sequence has similarities with M. kansasii and may therefore results in false positives. In a clinical study of intraocular TB, same specificity and lower sensitivity compared to mpb64 were reported [18, 43, 44].
In a comparative study between common diagnostic target sequences of TB, the sensitivity of mpb64 was highest (84% in confirmed cases and 77.5% in clinically suspected cases). The clinical sensitivity of other targets was as follows: mpb64 > IS6110 > hsp65 > 38KDa (pstS1) > 30KDa (fbpB) > esat6 > cfp10 > devR [5].
sdaA is another target sequence with few studies. This 1383 bp sequence encodes a protein called serine dehydratase. Although the detection limit equivalent to 1 copy per μl has been reported for this target [18], but our blastn results showed the presence of many nonspecific regions. This sequence has 70–80% similarity to some non-tuberculosis mycobacteria (M. ulcerans, M. marinum, M. smegmatis, and M. fortuitum) and many non-Mycobacterium bacteria such as Rhodococcus and Nocardia.