Streptococcus pyogenes strains in Sao Paulo, Brazil: molecular characterization as a basis for StreptInCor coverage capacity analysis

Background Several human diseases are caused by Streptococcus pyogenes, ranging from common infections to autoimmunity. Characterization of the most prevalent strains worldwide is a useful tool for evaluating the coverage capacity of vaccines under development. In this study, a collection of S. pyogenes strains from Sao Paulo, Brazil, was analyzed to describe the diversity of strains and assess the vaccine coverage capacity of StreptInCor. Methods Molecular epidemiology of S. pyogenes strains was performed by emm-genotyping the 229 isolates from different clinical sites, and PCR was used for superantigen profile analysis. The emm-pattern and tissue tropism for these M types were also predicted and compared based on the emm-cluster classification. Results The strains were fit into 12 different emm-clusters, revealing a diverse phylogenetic origin and, consequently, different mechanisms of infection and escape of the host immune system. Forty-eight emm-types were distinguished in 229 samples, and the 10 most frequently observed types accounted for 69 % of all isolates, indicating a diverse profile of circulating strains comparable to other countries under development. A similar proportion of E and A-C emm-patterns were observed, whereas pattern D was less frequent, indicating that the strains of this collection primarily had a tissue tropism for the throat. In silico analysis of the coverage capacity of StreptInCor, an M protein-conserved regionally based vaccine candidate developed by our group, had a range of 94.5 % to 59.7 %, with a mean of 71.0 % identity between the vaccine antigen and the predicted amino acid sequence of the emm-types included here. Conclusions This is the first report of S. pyogenes strain characterization in Sao Paulo, one of the largest cities in the world; thus, the strain panel described here is a representative sample for vaccine coverage capacity analysis. Our results enabled evaluation of StreptInCor candidate vaccine coverage capacity against diverse M-types, indicating that the vaccine candidate likely would induce protection against the diverse strains worldwide.


Background
Streptococcus pyogenes, or Group A Streptococcus (GAS), is an exclusively human pathogen responsible for a broad variety of clinical manifestations ranging from pharyngitis and impetigo to invasive diseases, such as necrotizing fasciitis and toxic shock syndrome. Some strains can also trigger autoimmune diseases, such as acute rheumatic fever, rheumatic heart disease and glomerulonephritis [1]. GAS infections are the major cause of morbidity and mortality worldwide. The prevalence of severe GAS diseases is at least 18.1 million cases, which cause approximately 517,000 deaths per year [2]. M protein is a surface component of GAS and one of the main virulence factors due to its anti-phagocytic properties [3]. This protein contains a hyper variable amino terminal end that serves as substrate for gold standard emm-typing for strain identification. More than 220 different emm-types have been described [4]. Systematic epidemiological reviews clearly highlight significant differences in emm-type distribution across different regions of the world. Relatively limited numbers of emm-type are recovered from high-income settings, while a much higher diversity of strains circulates in low-income settings [5,6]. A complementary typing system, emm-pattern typing, is based on the presence and arrangement of emm and emmlike genes located in the mga locus within the S.pyogenes genome. This classification is correlated with tissue tropism as follows: A-C emm-pattern isolates are usually recovered from the throat infections, D emm-pattern strains are usually isolated from the skin (impetigo), and E emm-patterns are recovered from both biological sites [7,8].
Sanderson-Smith et al. recently proposed a functional classification of the emm-types in clusters according to the phylogenetic origin and microbiological characteristics of the strain. The cluster classification enabled comparison between strains and serves as a tool for vaccine development [9].
GAS contains numerous genes encoding virulence factors, such as streptococcal pyrogenic exotoxins (Spe proteins). These proteins constitute a family of bacterial toxins with powerful mitogenic effects on T cells expressing a particular Vβ domain of the T cell receptor molecule, inducing non-specific polyclonal activation of the immune system by binding directly to class II MHC molecules [10]. Several studies have reported that Spe exotoxin content is correlated with emm-types and associated with clinical manifestations [11][12][13]. Spe exotoxins most likely contribute to the severity of GAS infections. However, the exact molecular mechanism involved in specific pathologies is still not understood [14].
To date, no anti-streptococcal A vaccine is available; however, several candidates based on both N-and Cterminal portions of the M protein are in different stages of development [15]. Briefly, the 30-valent is based on the highly variable amino-terminal region of the M protein [16], and the J8 candidate vaccine a construction of minimal B-cell epitope from the C-repeat region [17].
StreptInCor candidate vaccine is based on amino acid sequences of the conserved region of the M5 protein. This candidate vaccine, in contrast to the others, contains both B and T cell epitopes to provide a strong protective immune response [18].
Although GAS infections are common in several regions of Brazil, only a few studies on the prevalence, emm-type profiles and virulence factors of the strains are available [19][20][21]. Here, we described the emm-type and superantigen profile of the most prevalent strains in Sao Paulo and assessed the theoretical coverage vaccine. Institutional Review Board (IRB) approval was obtained from the Heart Institute Ethics Committee (CAPPesq; approval number-0646/07) at the University of Sao Paulo. Patient informed consent was waived because this study is a retrospective analysis of strains from a microbiology collection.

S.pyogenes strain collection
The GAS diagnostic criteria were based on beta hemolysis in blood agar and sensitivity to bacitracin. Then, the specimens were cultured on sheep blood agar (Vetec, Brazil), followed by growth in Todd-Hewitt broth (Himedia, India) until OD 600 of 0.4 and stored at −80°C.
DNA isolation, emm-typing, patterning and emm-cluster distribution The genomic DNA extraction, emm-gene PCR amplification and sequencing and emm-type identification were performed according to the protocol described by the CDC (http://www.cdc.gov/ncidod/biotech/strep/strepblast.html) using the primers MF2 and MR1 for amplification and sequencing, respectively, as previously described [19]. The emm-pattern for each emm-type was deduced using the table of correspondence provided by a recent multi-center study [4]. The emm-cluster classification of the strains identified in this study was based on the new functional classification recently proposed by Sanderson-Smith et al. [9].

Superantigen profile
To identify the superantigens each gene carried by strain, PCR reactions were performed using specific primers and singleplex PCR as previously described for speA, speC, speG, speH, speI, speJ, ssa [13] and smeZ [12]. speB (cysteine protease) was used as a positive control in our PCR reaction.

Statistical analysis
The Simpson Reciprocal Index (1/D) of 1 corresponds to a theoretical situation in which only one emm-type/ cluster is recovered, representing the lowest diversity possible. The maximum Simpson Reciprocal Index corresponds to the total number of emm-type/cluster recovered in one area. Higher values indicate greater diversity. A Simpson Index was calculated using the following formula: D = ∑ (n/N) 2, where "n" is the total number of isolates of a given emm-type or belonging to a given cluster and "N" is the total number of isolates of all the emmtypes/clusters recovered in an area [22,23]. Confidence intervals were calculated as previously described [24].

M protein sequence analyses
M proteins complete sequences and C repeat annotation from each emm-type included in this study were derived from previous study [4]. Multiple proteic alignments were obtained using Muscle software as implemented in Geneious® version R8.

emm-pattern and emm-cluster distribution
We inferred the emm-pattern for 213 of 214 emm-types, except for emm127 (previously named st223). Pattern E and A-C emm-types were present at similar proportions (43 and 38 %), whereas pattern D strains were less frequent (18 %).

Vaccine coverage
Theoretical vaccine coverage capacity of StreptInCor candidate vaccine was accessed considering the amino acid sequence alignment with the M protein C-terminal region for the 46 emm-types identified here (the complete M protein sequence was missing for both emm127 and emm99). The identities ranged from 94.5 % to 59.7 % (mean of 71 %). Some emm-types presented with an insertion of 7 amino acid residues in their sequences, as previously described (Fig. 2).

Discussion
Streptococcus pyogenes is an important human pathogen responsible for several invasive and non-invasive diseases in Brazil and worldwide. In this study, we characterized 229 invasive and non-invasive Streptococcus pyogenes samples from patients treated at the Clinical Hospital in Sao Paulo, Brazil. Great diversity of emmtypes was observed. Forty-eight emm-types were observed in the 229 samples, with the 10 most frequent emm-types accounting for 69 % of all isolates. In terms of GAS strain diversity, a Simpson Reciprocal Index of 1 corresponding to a theoretical situation where only one emm-type/cluster has been recovered, representing the lowest diversity possible. The maximum value of the Simpson Reciprocal Index corresponds to the total number of emm-type/cluster recovered in one area. The higher the value is, the greater the diversity. The reciprocal Simpson index of diversity found in this study was relatively low (12.7) when compared to the index of 26.72 for Brasilia (in the central region of Brazil) [19]. On the other hand, our results were similar to those reported for high incomes suburbs from Salvador, in northeastern Brazil [20].
The distribution of the strains identified in this study is comparable to those found in other countries, particularly in high-income countries in Asia, the Middle East and Latin America, in which emm1 and emm12 were the most common types, as reviewed by Steer [6]. Interestingly, emm1, emm12 and emm89 have also been found in various studies conducted recently in several countries in Europe and China; these types were frequently correlated with invasive and/or noninvasive isolates [25]. emm77 had a high frequency in the invasive isolates found here. In addition, this strain has been associated with non-invasive diseases in Germany [26] and was found in both invasive and non-invasive isolates in Spain [12]. Among the 229 isolates, E and A-C emm-patterns were found in similar proportions, whereas pattern D was less frequent. Interestingly, studies from Brasilia, in the Central region of Brazil [19], revealed a higher proportion of E and D patterns (51 % and 36 %, respectively), whereas A-C patterns was rarely observed (9.5 %). The data demonstrate the variability of streptococcal strains in Brazil, which may be related to socio-economic differences and can be extended to other countries in which there are also social disparities.
Other factors that play a role in the clinical manifestation of S. pyogenes infection may be due to the associations between emm-types and superantigens.
The other chromosomal gene, speJ, was present in only 35 % of isolates and was absent in diverse emmtypes, similar to others studies [12,29,30].
Currently, no anti-streptococcal vaccine is available in animal models of streptococcal disease, despite extensive efforts. Some models of anti-streptococcal vaccines are in different stages of development. Among them, the 30valent contains short peptides from the highly variable amino-terminal region of the M protein [16], and the J8 vaccine candidate comprises a 12 amino acid minimal B- The emm-types obtained fit into 12 different emm-clusters: A-C3 (21 %), E4 (20 %), E3 (13 %), D4 (12 %), single protein cluster clade Y (9 %), A-C4 and E6 (7 %), A-C5 and E1 and E2 (3 %), D2 and D5 (1 %) Fig. 1 Frequency of emm-types. A total of 48 emm-types were represented in the collection. Abbreviation: GAS, group A streptococcus cell epitope from the C-repeat region flanked by 16 amino acids of a yeast DNA-binding protein conjugated to the diphtheria toxoid [17]. The vaccine candidate developed by our group, called StreptInCor, is based on the M5 protein C-terminal region [18], specifically the C2 and C3 region that is conserved among serotypes. Through in silico analysis with predicted amino acid sequence alignment, StreptInCor candidate vaccine had high sequence identity with 46 of the 48 emm-types described here (identity ranged from 94.5 % to 59.7 %, mean of 71 %), which is an important property for the probability of protection. In previous data, we described the structural, chemical, and biological properties of the StreptInCor peptide and demonstrated that the molecule is stable, which is an important property for a vaccine candidate. The possibility of the StrepInCor vaccine candidate epitope being processed by antigen-presenting cells (APCs) generating diverse peptides has also been previously demonstrated. The approach resulted in the observation that the vaccine epitope could be recognized by any individual, thus enabling a broad coverage capacity to trigger specific immunity [33].
The efficacy of this vaccine in animal models was evaluated in inbred and outbred mice, and a strong humoral response with high IgG production was observed [18]. Immunized Swiss mice challenged with the emm1 strain had a survival rate of 87 % at 21 days compared with lower survival in controls (53 %) [34].
Similar results have been observed in HLA class II transgenic mice, which also presented a specific and long-lasting immune response without developing deleterious reactions after one year. These results indicated that StreptInCor is a safe candidate vaccine [35]. In addition, the four most common emm-types included here (emm1, emm12, emm22 and emm87) were opsonized by StreptInCor-induced antibodies [36]. The strains identified here were fit into 12 of the 19 different emm-clusters and exhibited diverse phylogenetic origin and consequently different mechanisms of infection and resistance to escape the host immune system, supporting the hypothesis that StreptInCor vaccination would likely protect against infection caused by strains from different emm-clusters.

Conclusions
This is the first study investigating the epidemiology of streptococcal strains in Sao Paulo, one of the largest cities in the world. These data enabled evaluation of the Strep-tInCor candidate vaccine coverage capacity against diverse M-types, indicating that the vaccine candidate would likely induce protection against the diverse strains observed worldwide.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions SF and KMA contributed equally to the study design, coordination, analysis and interpretation of data, and drafting of the manuscript; SF, KMA, RA, and AT carried out the lab work for strain characterization and maintenance; RA and EP contributed to analysis and interpretation of data and drafting of the manuscript; PRS carried out the sequence alignments and the theoretical vaccine coverage capacity statistical analysis; FR and JAJ carried out the sample collection and microbiological assays for S. pyogenes diagnostics; LG contributed to study design, data analysis and interpretation, and drafting and revising the manuscript; JK contributed to study design and drafting and revising the manuscript. All authors have read and approved the final manuscript.