A comparison of two informative SNP-based strategies for typing Pseudomonas aeruginosa isolates from patients with cystic fibrosis

Background Molecular typing is integral for identifying Pseudomonas aeruginosa strains that may be shared between patients with cystic fibrosis (CF). We conducted a side-by-side comparison of two P. aeruginosa genotyping methods utilising informative-single nucleotide polymorphism (SNP) methods; one targeting 10 P. aeruginosa SNPs and using real-time polymerase chain reaction technology (HRM10SNP) and the other targeting 20 SNPs and based on the Sequenom MassARRAY platform (iPLEX20SNP). Methods An in-silico analysis of the 20 SNPs used for the iPLEX20SNP method was initially conducted using sequence type (ST) data on the P. aeruginosa PubMLST website. A total of 506 clinical isolates collected from patients attending 11 CF centres throughout Australia were then tested by both the HRM10SNP and iPLEX20SNP assays. Type-ability and discriminatory power of the methods, as well as their ability to identify commonly shared P. aeruginosa strains, were compared. Results The in-silico analyses showed that the 1401 STs available on the PubMLST website could be divided into 927 different 20-SNP profiles (D-value = 0.999), and that most STs of national or international importance in CF could be distinguished either individually or as belonging to closely related single- or double-locus variant groups. When applied to the 506 clinical isolates, the iPLEX20SNP provided better discrimination over the HRM10SNP method with 147 different 20-SNP and 92 different 10-SNP profiles observed, respectively. For detecting the three most commonly shared Australian P. aeruginosa strains AUST-01, AUST-02 and AUST-06, the two methods were in agreement for 80/81 (98.8%), 48/49 (97.8%) and 11/12 (91.7%) isolates, respectively. Conclusions The iPLEX20SNP is a superior new method for broader SNP-based MLST-style investigations of P. aeruginosa. However, because of convenience and availability, the HRM10SNP method remains better suited for clinical microbiology laboratories that only utilise real-time PCR technology and where the main interest is detection of the most highly-prevalent P. aeruginosa CF strains within Australian clinics.


Background
Cystic fibrosis (CF) is the most common, lethal autosomal recessive disease in Caucasian populations [1]. Most CF patients die in their third or fourth decade from complications of chronic pulmonary infection. Pseudomonas aeruginosa is the predominant pathogen and once it is established within the lungs of CF patients it is rarely eradicated, resulting in increased treatment requirements and an accelerated decline in lung function, quality of life and survival [2]. While many CF patients acquire P. aeruginosa from their natural environment, there is also evidence of person-to-person transmission occurring [3]. Delaying or even preventing P. aeruginosa infection is an important management goal. Consequently, determining P. aeruginosa acquisition pathways and conducting longitudinal surveillance using molecular-based typing techniques are critical steps for developing novel interventions and evidence-based infection control policies to interrupt the spread of transmissible strains within the CF community [4][5][6].
Recently, multi-locus sequence typing (MLST) has emerged as an important epidemiological tool for investigating temporally and geographically diverse bacteria [7]. It offers a standardised, reproducible and portable typing approach that allows reliable data comparisons by way of a publically accessible web-based database [7,8]. However, when applied to large-scale investigations involving many hundreds or thousands of isolates it is limited by cost and complexity [9]. To circumvent these problems, some researchers have utilised defined sets of informative single nucleotide polymorphisms (SNPs) derived from MLST data to infer genetic relationships between isolates. In essence, it is a narrowed MLST approach and has been applied to various organisms, including pathogens relevant to CF, such as methicillin-resistant Staphylococcus aureus and P. aeruginosa [10][11][12][13]. Selection of appropriate SNPs, including SNP location and total numbers, is an integral facet of informative SNP strategy to ensure a discriminatory, yet cost-effective, typing scheme. However, once an informative SNP approach tailored to a particular purpose is implemented, it will theoretically have limitations in terms of discriminatory power if used beyond its original objectives.
Previously, we have shown that SYBR Green-based realtime polymerase chain reaction (PCR) assays and highresolution melting (HRM) curve analysis targeting 10 key SNPs in five housekeeping genes (HRM10SNP) can detect the major P. aeruginosa strains shared by CF patients in Queensland, Australia [10]. Furthermore, we demonstrated recently that this form of typing can be adapted to the iPLEX MassARRAY platform to allow highthroughput genotyping [14]. However, based on the high levels of genetic diversity observed amongst shared P. aeruginosa strains in the national Australian CF study [15] and also internationally amongst patients attending CF clinics [16], we sought to reassess the HRM10SNP and investigate alternative SNP-based typing strategies for identifying a broader range of P. aeruginosa strains.

Clinical isolates
To ensure representative and geographical diversity, 506 clinical isolates were sourced randomly from a biobank of CF isolates collected as part of an ongoing national study of shared P. aeruginosa strains involving patients attending 11 CF clinics in Australia's five largest cities [15] (Additional file 1: Table S1). Isolates were incubated on horse blood agar plates for 24-hours at 37°C. Once purity was confirmed, heat-denatured suspensions of each isolate were prepared as described previously [10].

HRM10SNPAssay
The HRM10SNP assay was performed for each isolate as described previously [10]. Briefly, each heat-denatured isolate was tested using 10 individual PCR reactions using the qPCR SuperMix-UDG (Invitrogen Australia, Mulgrave, NSW, Australia) on the Rotorgene-6000 (Qiagen, Doncaster, Victoria, Australia). Results from each reaction were compiled to provide a 10-SNP profile for each isolate. As reported previously, isolates with 10-SNP profiles of CTCCTCGGCA, TCTTTCGGTA and CCTCCTGATG were determined to be AUST-01, AUST-02 and AUST-06, respectively [10].

20-SNP iPLEXMassARRAY(iPLEX20SNP)
The iPLEX20SNP assay was based on the Sequenom MassARRAY platform (Sequenom, Brisbane, Queensland, Australia) and was a modification of a method described previously [14]. Here, SNPs were derived by analysing sequence data on the P. aeruginosa PubMLST website [17]. Briefly, 1070 concatenated sequences of P. aeruginosa housekeeping genes (acsA, aroE, guaA, mutL, nuoD, ppsA, and trpE) were downloaded (12 January, 2012) and investigated for informative SNPs with the aid of the Minimum SNPs software version 2043 [18] and by manual sorting (using BioEdit version 7.0.9.0). Overall, 20 SNPs were identified and SNP positions based on the 2882 bp concatenated P. aeruginosa MLST sequence are listed in Tables 1 and 2. Of these 20 SNPs, four were identical to SNPs used in the HRM10SNP assay; SNPs at sites 7, 322, 1152 and 2551 of the iPLEX20SNP assay (Tables 1  and 2) overlapped with SNPs 1, 2, 5 and 10 from the HRM10SNP assay.
Primers and extension primers for each of the 20 SNPs in the iPLEX20SNP were designed as reported previously [14]. All 20 target SNPs were designed for use in a single multiplex well using Assay Designer 4.0 software (Sequenom, Herston, Queensland, Australia). The 24 amplification primers and 21 extension primers used for SNP detection are listed in Tables 1 and 2. Two extension primers with overlapping mass were used for SNP site 416 to accommodate a known proximal SNP variation ( Table 2). SNP detection by MassARRAY was performed as outlined formerly [14], with the following modifications: (1) following the initial PCR, residual PCR Taq polymerase was removed by protease digestion; 1 μl of protease solution (1.07 AU, Qiagen, Doncaster, Victoria, Australia) was added to each PCR reaction and the mixture incubated at 55°C for 30 min followed by an inactivation at 95°C for 5 min; and (2) the single base extension step was performed using the iPLEX Pro Extension Reaction Kit (Sequenom, Herston, Queensland, Australia) following manufacturer's instructions. SNPs were coded from 1 to 20 to generate a 20 SNP code. The 20-SNP profiles were then interpreted using the data compiled from in-silico analysis of the P. aeruginosa MLST database, as described below, to provide predicted sequence types (STs). Characterised isolates representative of each SNP were used as reference controls for each test run.

Statistical analysis
Discriminatory power and the quantitative measure of congruence between the HRM10SNP and iPLEX20SNP methods and corresponding 95% confidence intervals (CI) were determined by calculating the Simpson's Index of Diversity and the adjusted Wallace coefficients respectively using the online analysis tool at http://darwin.phyloviz.net/ ComparingPartitions/index.php?link=Tool. The 20-SNP profile in-silico data were used to predict STs for the 506 clinical isolates utilising the experimental results from the iPLEX20SNP assay.

HRM10SNP and iPLEX20SNP typing of the 506 clinical isolates
Application of the HRM10SNP assay provided complete 10-SNP profiles for 494/506 isolates (type-ability = 97.6%) of which 92 different 10-SNP profiles were observed; 12 isolates were not typed using the HRM10SNP method as one or more SNPs failed to be called by the HRM analysis (Additional file 3: Table S3). The iPLEX20SNP assay provided complete 20-SNP profiles for 471/506 isolates (type-ability = 93.1%) of which there were 147 distinct 20-SNP profiles; 35 isolates failed to provide complete 20-SNP profiles due to the iPLEX20SNP assay failing to characterise one or more SNPs (Additional file 3: Table S3). When the 147 complete 20-SNP profiles (471 isolates) from the iPLEX20SNP assay were used to predict a MLST type (based on the data provided in Additional file 2: Table S2), 124 of 147 (84.4%) profiles matched profiles obtained from the MLST website and there could provide a predicted MLST type or types. Twenty-three 20-SNP profiles from 28 isolates did not match with any of the listed 20-SNP profiles in Additional file 2: Table S2, and therefore a MLST type could not be predicted. Overall, 470 isolates provided complete SNP profiles by both the HRM10SNP and iPLEX20SNP assays. Simpson's Index of Diversity and adjusted Wallace coefficients between the HRM10SNP and iPLEX20SNP methods were calculated using these 470 isolates (Table 4). Simpson's Index of Diversity of the iPLEX20SNP (0.947) was similar to that of the HRM10SNP method (0.944). However, when concordance between the assays was assessed using the adjusted Wallace coefficient, the iPLEX20SNP method (94.9%) was a better predictor of the HRM10SNP method  To investigate the latter further we identified all 10-SNP profiles that were further discriminated by the 20-SNP profiles (Additional file 4: Table S4); 34 HRM10SNP profiles were further distinguished into 101 20-SNP profiles using the iPLEX20SNP method. Of note, these involved 30 STs associated with CF strains of local or international importance (Additional file 4: Table S4). In contrast, there were only 11 iPLEX20SNP profiles that were further discriminated by the HRM10SNPassay (Additional file 3: Table S3). Given the high prevalence of AUST-01, AUST-02 and AUST-06 in Australia, and that the HRM10SNP assay was primarily designed to target these strains, we compared the ability of both assays to distinguish these strains. For isolates identified as AUST-01, AUST-02 or AUST-06 by either method (Additional file 3: Table S3), the results of the two methods were in agreement for 80/81 (98.8%), 48/49 (97.8%) and 11/12 (91.7%) isolates, respectively. Both isolates giving discrepant results for AUST-01 and AUST-06 were identified as AUST-01 or AUST-06 by the iPLEX20SNP method, but not by the HRM10SNP assay. For both of these isolates, their 10-SNP profiles by the HRM10SNP differed by only one SNP from the expected profiles of AUST-01 and AUST-06. Upon repeat testing in the HRM10SNP assay, both subsequently typed as AUST-01 and AUST-06, suggesting that there was a mistake in the original HRM10SNP testing. The discordant result for AUST-02 was associated with a different ST; one isolate was identified as AUST-02 by the HRM10SNP method, but differed by two SNPs from the expected 20-SNP profile for AUST-02 in the iPLEX20SNP assay (predicted MLST type of 778).

Discussion
The in-silico analyses of sequence data from the P. aeruginosa PubMLST website showed that more than half of recognised STs could be distinguished individually by the 20-SNP profile of the iPLEX20SNP assay. Furthermore, the recognised STs that were unable to be distinguished by this assay were typically single-or double-locus variants. Hence, theoretically the iPLEX20SNP method has considerable potential for broader-based MLST-focused studies of P. aeruginosa, here and elsewhere. As the iPLEX20SNP is also based on the Sequenom MassARRAY platform, it is particularly suitable for high-throughput investigations [14]. Using this technology up to 384 isolates can be tested within one working day for less than $AUD 10 per isolate [14], and is therefore quite favourable compared to other technologies. For example, for our 506 test isolates we estimate that classical DNA sequencingbased MLST would have cost approximately $AUS 60,720 ($AUS 120 per isolate), whereas costs for the HRM10SNP and iPLEX20SNP methods were approximately $AUS 10,120 ($AUS 20 per isolate) 13 and $AUS 5,060 respectively.
Compared to the HRM10SNP, the iPLEX20SNP method clearly provided better discrimination when applied to the P. aeruginosa test isolates used in this study. Of note was that the HRM10SNP assay grouped numerous unrelated isolates, including STs of shared strains in the CF patient population, while the iPLEX20SNP method was able to distinguish between these isolates (Additional file 4: Table S4). This was likely due to the higher number of SNPs and that SNP selection for iPLEX20SNP was based on a large international MLST database. These observations provide experimental data to support the above  in-silico analyses. Indeed, in the clinical context, attaining optimal discriminatory power is particularly important when trying to identify new or emerging shared P. aeruginosa strains in CF patients. Consequently, iPLEX20SNP is ideally suited for broader, investigatory studies of P. aeruginosa infected patients. While the HRM10SNP lacked overall discriminatory power, it nevertheless proved to be well-suited for detecting AUST-01, AUST-02 and AUST-06 amongst P. aeruginosa isolates from a broad range of Australian CF clinics. AUST-01 and AUST-02 are the shared P. aeruginosa strains of greatest concern in Australia [15], and therefore simple methods for detecting these strains remain of local clinical and research interest. The one key benefit of the HRM10SNP method is that it is based on real-time PCR technology, which is now commonplace in most clinical microbiology laboratories. Hence, the HRM10SNP method may still be a useful diagnostic tool locally for laboratories with no access to specialised equipment such as the Sequenom MassARRAY platform.
Limitations in terms of typeability (i.e., the number of isolates providing complete SNP profiles) were observed, however, with 2.4% and 6.9% of isolates failing to give complete profiles in the HRM10SNP and the iPLEX20SNP assays respectively. Typically these problems are caused by poor isolate preparation (i.e., insufficient DNA) or otherwise sequence variation in primer targets [10,14]. Given the sheer diversity amongst the P. aeruginosa MLST housekeeping genes, it is highly likely that sequence variation would account for a large proportion of the problems observed here. In any event, we do not see this as an important limitation affecting the broader utility of the assays given that other methods, such as DNA sequencing, could be applied if necessary to the small numbers of untypeable isolates.

Conclusions
In summary, molecular typing is an integral part of investigating the development and spread of shared P. aeruginosa strain genotypes in patients with CF. The iPLEX20SNP is a superior new method providing sufficient throughput and discriminatory power for broader SNP-based MLSTstyle investigations of P. aeruginosa, whereas the HRM10SNP method remains a convenient technique for screening CF clinical isolates for the current most commonly shared Australian P. aeruginosa strains and should be able to be performed by most clinical microbiology laboratories.