High frequency of hybrid Escherichia coli strains with combined Intestinal Pathogenic Escherichia coli (IPEC) and Extraintestinal Pathogenic Escherichia coli (ExPEC) virulence factors isolated from human faecal samples

Background Classification of pathogenic Escherichia coli (E. coli) has traditionally relied on detecting specific virulence associated genes (VAGs) or combinations thereof. For E. coli isolated from faecal samples, the presence of specific genes associated with different intestinal pathogenic pathovars will determine their classification and further course of action. However, the E. coli genome is not a static entity, and hybrid strains are emerging that cross the pathovar definitions. Hybrid strains may show gene contents previously associated with several distinct pathovars making the correct diagnostic classification difficult. We extended the analysis of routinely submitted faecal isolates to include known virulence associated genes that are usually not examined in faecal isolates to detect the frequency of possible hybrid strains. Methods From September 2012 to February 2013, 168 faecal isolates of E. coli routinely submitted to the Norwegian Institute of Public Health (NIPH) from clinical microbiological laboratories throughout Norway were analysed for 33 VAGs using multiplex-PCR, including factors associated with extraintestinal pathogenic E. coli (ExPEC) strains. The strains were further typed by Multiple Locus Variable-Number Tandem-Repeat Analysis (MLVA), and the phylogenetic grouping was determined. One isolate from the study was selected for whole genome sequencing (WGS) with a combination of Oxford Nanopore’s MinION and Illumina’s MiSeq. Results The analysis showed a surprisingly high number of strains carrying ExPEC associated VAGs and strains carrying a combination of both intestinal pathogenic E. coli (IPEC) and ExPEC VAGs. In particular, 93.5% (101/108) of isolates classified as belonging to an IPEC pathovar additionally carried ExPEC VAGs. WGS analysis of a selected hybrid strain revealed that it could, with present classification criteria, be classified as belonging to all of the Enteropathogenic Escherichia coli (EPEC), Uropathogenic Escherichia coli (UPEC), Neonatal meningitis Escherichia coli (NMEC) and Avian pathogenic Escherichia coli (APEC) pathovars. Conclusion Hybrid ExPEC/IPEC E. coli strains were found at a very high frequency in faecal samples and were in fact the predominant species present. A sequenced hybrid isolate was confirmed to be a cross-pathovar strain possessing recognised hallmarks of several pathovars, and a genome heavily influenced by horizontal gene transfer. Electronic supplementary material The online version of this article (10.1186/s12879-018-3449-2) contains supplementary material, which is available to authorized users.


Background
Escherichia coli (E. coli) is a highly diverse and predominant species among facultative anaerobic bacteria of the human gastrointestinal tract [1]. E. coli comprises non-pathogenic commensals as well as strains causing a range of diseases. E. coli strains capable of causing extraintestinal infections are designated as extraintestinal pathogenic E. coli (ExPEC) to distinguish them from strains causing intestinal disease, commonly designated as intestinal pathogenic E. coli (IPEC).
A wide range of VAGs have been associated with ExPEC and common virulence attributes among ExPEC strains are those enabling their extraintestinal lifestyle e.g. genes coding for the production of adhesins, toxins, protectins, siderophores, iron transport systems, and invasins [2,[6][7][8][9]. It is believed that ExPEC are facultative pathogens, which reside in the normal gut flora as commensals in some groups of the healthy population [8]. However, there are no universal accepted concrete genetic criteria for defining an E. coli strain as ExPEC nor for definite pathovar classification within the ExPEC group. Thus, the true pathovar classification can only be done on the basis of the isolation source for the majority of ExPECs.
There is limited information regarding the frequency of ExPEC strains in the human intestine, however a recent meta study of more than 500 published papers assessed a prevalence of ExPEC strains among faecal isolates of about 10% in healthy individuals [10]. Reference laboratories or diagnostic microbiological laboratories routinely search for only the established IPEC virulence factors in faecal samples from symptomatic patients. There exist little data on the frequency of ExPEC related virulence factors among these strains.
The aim of this study was to investigate the frequency and combination of virulence markers including VAGs used for IPEC pathovar classification and a selection of VAGs related to ExPEC pathovars among E. coli strains submitted from individuals showing signs of gastrointestinal infections. We assessed the frequency of ExPEC and IPEC strains, phylogenetic grouping and the MLVA-genotype.
In light of the large German O104:H4 outbreak in 2011 [11], which was caused by a hybrid Enteroaggregative E. coli (EAEC)/Shiga toxin producing E. coli (STEC) strain [12], the monitoring of isolates to detect new or altered combinations of VAGs is important as it may give a pre-warning of emerging strains harbouring novel VAG combinations, which should be studied in closer detail to assess whether they also have altered virulence capabilities.

Bacterial isolates
All 168 E.coli strains were obtained from the culture collection at the National Reference Laboratory for Enteropathogenic Bacteria at the Norwegian Institute of Public Health (NIPH).
All primers had a final concentration of 5 μM. The PCR was run on a GeneAmp 9700 thermocycler (Applied-Biosystems, Foster City, CA, USA) with the following conditions: multiplexes 1, 2 and 4; 95°C for 15 min, then 25 cycles of 94°C for 30 s, 58°C for 90 s and 72°C for 90 s, followed by a hold on 72°C for 10 min after temperature cycling has ended. Multiplex 3; 95°C for 15 min, then 25 cycles of 94°C for 30 s, 60°C for 90 s and 72°C for 90 s, followed by a hold on 72°C for 10 min after temperature cycling has ended. The multiplexes were diluted 1:25 and run in separate capillaries on an ABI 3130 Genetic Analyzer (Applied-Biosystems, Foster City, CA, USA) with GS 600LIZ as internal size standard.

Phylogenetic group PCR
The improved phylogenetic PCR-assay [16] of the original assay described by Clermont [17] was used to assign the E. coli isolates to major phylogenetic groups and subgroups.

MLVA
Multi-locus variable-number tandem repeats analysis was performed using a modified version of the 10-loci generic E. coli MLVA scheme previously published [18]. The PCR-amplicon of the published CCR001 locus contains two variable repeated elements, and the modified scheme allows typing of both these variable elements increasing the number of the generic E. coli MLVA to 11-loci. The modification consists of a change of dyes and an additional new reverse-primer at the CCR001 locus as follows: the 6FAM dye was removed from the published CCR001 forward primer [18] and the published unlabelled CCR001 reverse primer was labelled with 6FAM and renamed CCR001aR. A new second VIC-labelled reverse primer was added "CCR001bR: 5' -VIC-CGCATTTTATCTGTCTGTACGGC -3'". The combination of both reverse primers made it possible to simultaneously separate both repeat containing regions at the CCR001 locus.

Stx subtyping
Subtyping of stx1 and stx2 was performed as described in Brandal et al. 2015 [15].

Oxford Nanopore MinION sequencing
The hybrid ExPEC/IPEC strain FHI_NMBU_03 identified by PCR, was chosen for sequencing by the MinION MK1 device. DNA was quantified using the Qubit fluorometer (Life Technologies, Paisley, UK) and 200 ng of DNA was used for library preparation. The strain was sequenced using the R9.4 SpotON flow cell and the SQK-RAD002 rapid sequencing kit. All runs were prepared according to the standard protocol of Oxford Nanopore Technologies (Oxford, UK). The flow cells were primed with a priming solution that consisted of a mixture of nuclease free water and Fuel Mix. The library was then loaded into the MinION SpotON port and the 48-h sequencing protocol was selected in the Min-KNOW software. The basecalling was done through the Metrichor Desktop Agent using 1D Basecalling for the SQK-RAD002 protocol.

Illumina MiSeq sequencing
Illumina sequencing was performed on an Illumina MiSeq platform (Illumina Inc., San Diego, CA, USA). Library was prepared using the Nextera XT kit (Illumina Inc) according to manufacturer's instructions and was sequenced using a 300 bp paired-end sequencing kit (Illumina Inc).

Sequence analysis
Raw Illumina reads were paired and quality filtered using Trimmomatic [19] and bases with low quality (< q20) were discarded. MinION reads were extracted using poRe [20] and both read types were assembled using SPAdes [21] version 3.5.0 using the option "--nanopore".
Using combined MiSeq and MinION data, the sequences were assembled into a large contig constituting the genome and a contig containing a large virulence plasmid.
The sequence data was annotated using four different services, the NCBI Prokaryotic Genome Annotation Pipeline [22], the BASys Bacterial Annotation System [23], The RAST Annotation Server [24] and Prokka [25]. The sequences were further analysed using a variety of free and publicly available software. Integrated prophages and genomic islands (GIs) were searched using PHASTER [26] and Island Viewer 4 [27] respectively, and the final location of prophages and GIs was determined using a combination of the resulting data. Multilocus sequence typing (MLST)-type, Fim-type, antibiotic resistance genes, and virulence genes were searched using online services from the Center for Genomic Epidemiology (CGE) at the Danish Technical University (DTU), Lyngby, Denmark (http://www.genomicepide miology.org/). Assembly and annotation of the isolate FHI_NMBU_03 and its plasmid are publicly available at NCBI (accession number CP019455 and CP019456, respectively).
Among the 21 ETEC isolates, ehaG was detected in 12 strains (57%), but ehaA was not detected in any of the ETEC isolates.
When we looked at pair-clustering of the VAGs we found that the most common pairs (in more than 20% of isolates) of VAGs included: ehaA and ehaG in 60/168

Sequencing
One strain from this study designated FHI_NMBU_03 from MLVA-cluster 6 was selected for whole genome sequencing using a combination of long-and short-read technologies, Oxford Nanopore MinION (91,865 reads) and Illumina MiSeq (361,031 reads), respectively. We were able to assemble a complete closed circular genome (4,685,056 bp acc. nr. CP019455) and a complete circular virulence plasmid (159,821 bp acc. nr. CP019456) pFHI_NMBU_03-1 from the combined runs. The genome sequence (coverage 21.6x) contained 4954 genes (gene density 1.057 genes/Kbp) and 200 pseudogenes, with a GC content of 51%. The chromosome contains five integrated prophages according to PHASTER analysis [26], and 19 genomic islands (phages excluded) according to the Island Viewer 4 software [27]. FHI_NMBU_03 showed a surprising collection of both IPEC and ExPEC related VAGs as indicated by the PCR-analysis. It contained the locus of enterocyte effacement (LEE)-region of EPEC/EHEC as well as recognized markers for ExPEC subtypes of UPEC/APEC and NMEC. The LEE region of FHI_NMBU_03 contains 36 recognized genes, four open reading frames (ORFs) of unknown function as well as two pseudogenes, and is inserted in the selC tRNA gene. The eae-intimin subtype of FHI_NMBU_03 is β2. The LEE-encoded Tir protein of FHI_NMBU_03 is, by BLAST search, identical to three Tir proteins from EPEC strains and one protein from a human strain designated as UPEC (upec-202, SAMN02802023), as well as eight animal strains. Additionally the genome encodes the intimin-like proteins FdeC and a SinH-variant. FHI_NMBU_03 was also positive for a cluster of the non-LEE-encoded effectors nleB, nleC, nleG, nleH and a frameshifted nleA pseudogene, located within a phage-region identified by PHASTER. Using CGE the MLST type was predicted to be ST28 and the fimH subtype was predicted to fimH90. A selection of chromosomal genes found by sequencing associated with virulence can be seen in Table 4. On the large virulence plasmid, ExPEC pathogenicity associated genes include: bor (an iss homologue), traT (serum resistance associated), the pyelonephritis-associated pilus pap operon; papABCDEFHJK, a putative pixG adhesin related gene encoding a protein 99% identical to a protein (EQZ28352.1) from the E. coli human UTI strain UMEA-3585-1 (PRJNA186355), a putative autotransporter gene encoding an uncharacterized protein identical to protein EQZ28355.1 from UMEA-3585-1, iroN (catecholate siderophore receptor), an AppA (HlyII) hemolysin protein and the leukotoxin genes lktBCD.
The alkB gene coding for the alkylated DNA repair protein AlkB has an internal frameshift, and is probably inactive in FHI_NMBU_03. Several loci pertaining to fimbrial structures were found and noteworthy are genes related to K88-fimbria, 987P-fimbria and colonization factor antigen I fimbriae (CFA/I), which are all associated with ETEC strains. FHI_NMBU_03 is also positive for the YghJ protein gene, also known as SslE (Secreted and surface associated lipoprotein), which is a cell surface associated and secreted lipoprotein harbouring M60 metalloprotease domain [28].  A previously reported insertion of unknown origin with a base composition suggestive of horizontal gene transfer in a genetic region between mutS and rpoS, associated with phylogroup B2 and uropathogens [29] is additionally present. This region has later been named the o454-nlpD region [30].

Discussion
Clinical microbiological laboratories and reference laboratories rely increasingly on genetic testing of faeces to identify possible pathogenic microbes. For enteric bacteria, a widely used practice is to perform PCR or real-time PCR assays, or other amplification methodology, to detect specific genes used for pathogen identification. For E. coli, PCR on faecal isolates [13] is used to detect the well-recognized IPEC pathovars EPEC, STEC, ETEC, EAEC and EIEC [31]. These pathovars all have genetic targets used for identification and classification. The most common genetic targets are the eae and bfp genes for EPEC, stx1 and stx2 genes for STEC, genes encoding the thermostable (ST) and thermolabile (LT) toxins for ETEC, the aggR gene for EAEC, and the ipaH gene for EIEC. These targets are also candidate targets for automatic pathogen identification systems, especially in a culture-independent diagnostic tests (CIDTs) workflow. The results from these assays will be a classification of the E. coli isolates into one of the recognized pathovars or, in case of no target amplification, a classification as a non-enteropathogenic or commensal strain.
In the present study, we looked at a wider range of virulence factors in faecal E. coli isolates submitted to the Reference Laboratory for Enteropathogenic Bacteria at the Norwegian Institute of Public Health (NIPH). We especially searched for known ExPEC VAGs as in recent years a heighten interest in the frequency of ExPEC strains in the human gut has emerged, however there are few studies examining the selection of VAGs used in the present study.
One surprising finding in our study was the high frequency of E. coli strains (64.3%) with a combination of recognized IPEC and ExPEC VAGs. There are limited data on how common these IPEC/ExPEC hybrid strains are. In a study of 265 E. coli isolates from hospital inpatients and outpatients with UTIs, 10.6% of isolates harboured at least one IPEC virulence factor [32]. In previous studies of human faecal isolates, the E. coli strains are separately designated as IPEC or as commensal strains harbouring ExPEC VAGs, thus it is unclear how high of a percentage may be IPEC/ExPEC combinatory strains. The IPEC/ExPEC combination was especially high among the aEPEC strains (91.8%).
One notable finding was that 13 out of 14 (92.9%) ibeA positive isolates was an EPEC strains of phylogenetic group B2. Thus, ibeA carriage in faeces seems to be associated with a distinct group of IPEC strains in our material. The ibeA gene is a known virulence factor of E. coli strains responsible for neonatal meningitis in humans (NMEC) by contributing to the invasion of  [33]. It has also been described that ibeA plays an important role in the invasion of intestinal epithelial cells, as the absence of ibeA accounted for a reduction in invasion of ca. 67% compared to wild type in experiments with the adherent-invasive E. coli (AIEC) strain NRG857c and an ibeA deletion mutant strain (NRG857cΔibeA) [34]. Furthermore, ibeA was present in the genome of 26% of pathogenic isolates from chicken (APEC), but absent from the genome of non-pathogenic isolates of avian origin [35]. The ibeA gene was positively linked to the pathogenicity of the APEC strains, and it was additionally shown that ibeA was involved in the invasion of human BMEC by the APEC strain BEN 2908 [35]. An interesting observation was the high number of strains harbouring genes coding for the trimeric autotransporter proteins (TAAs) EhaA and EhaG. Especially finding the ehaG gene in 48% of the strains with one or more ExPEC VAGs and no IPEC VAGs, since EhaG mediates specific adhesion to colorectal epithelial cells [36]. This indicates that 48% of our isolates carrying solely ExPEC VAGs may have the capacity to adhere to colorectal epithelial cells in humans. Both ehaA and ehaG are most prevalent in the phylogenetic groups B1 and D, while a difference between ehaA and ehaG was observed in phylogenetic group A where ehaA was not detected but ehaG was present in 34% of the isolates. The distribution pattern of ehaA and ehaG was in the same range as results from a study by Zude et al. 2014 [37], with the exception of phylogenetic group B2 where Zude et al. 2014 report that 21.9% of the strains carry the ehaG gene, while in the present study 7.1% of the B2 strains were positive for ehaG. EhaG is localized at the bacterial cell surface and, in addition to colorectal epithelial cell adhesion, promotes cell aggregation, biofilm formation, and adherence to a range of extracellular matrix (ECM) proteins [36]. TAAs are regarded as important virulence factors of many Gram-negative bacterial pathogens. We are aware that our PCR-based phylogrouping results may show minor differences from the 2013 Clermont method [38]. Non-IPEC strains are not stored at NIPH thus a re-typing of all strains using the 2013 Clermont method on all strains in this study is not possible, however the findings and conclusions are valid, and in future our phylogrouping will be sequenced-based e.g. by using online tools [39].
The fully sequenced FHI_NMBU_03 phylogroup B2 strain (with plasmid) from this study shows hallmarks of ExPEC pathovars UPEC, APEC, NMEC and the IPEC pathovar aEPEC with some VAGs related to ETEC (K88-, 987P-and CFA/I-fimbrial genes), thus it constitutes a truly pathovar-hybrid strain (Additional file 3). The eae gene alone will classify it as an aEPEC by most molecular diagnostics tests.
It was previously reported that YghJ caused extensive haemorrhage in mouse ileum in a dose dependent manner and it was suggested that YghJ could be a virulence factor of enteric pathogens associated with haemorrhagic diarrhoea [28]. A recent study additionally showed that the YghJ protein from a neonatal septicaemic E. coli altered cellular morphology of various cell lines and triggered the induction of several proinflammatory cytokines, which are attributed as one of the key mediators in the pathogenesis of sepsis [40].
Several factors classify this strain as UPEC (e.g. usp, fyuA, sfaS, the pap fimbrial operon, chuA and yfcV). It has previously been reported that any two of yfcV, vat, or chuA along with fyuA could be used to differentiate UPEC from diarrheagenic E. coli (DEC), human commensal, or animal commensal isolates. However, to differentiate UPEC from APEC, vat, fyuA, and yfcV together are necessary, where the presence of the putative fimbrial subunit gene yfcV is highly predictive of UPEC, increasing the odds of a strain being UPEC by 99.5-fold [41].
The fimH90 subtype was also an interesting finding as it appears to be rare among E. coli strains and was not found among 243 draft genomes of E. coli isolates in a study using the CGE FimTyper Web tool [42]. However, BLAST searches found an identical fimH gene in a sequence scaffold from a human aEPEC strain (702898_aEPEC) isolated in Pakistan (GenBank: CYBW01000017.1). The CGE FimTyper confirmed this fimH gene to also be of subtype fimH90.
The comparison of sequence data with PCR typing revealed PCR positive results for tsh and vat while sequencing showed the presence of the highly related hbp gene on the chromosome and a putative related autotransporter on the virulence plasmid (locus tag: BXO92_24355). The PCR results can be explained by the similarity of the intended target genes, and the considerable confusion in GenBank submitted sequences on the correct nomenclature. The Tsh and Hbp proteins differ by only two amino acid residues. In addition, Vat and Tsh/Hbp are 77.5% identical in amino acids.
The plasmid located putative autotransporter protein (protein id: PRJNA362852:BXO92_24355) show 43.7% AA identity and 56.6% AA similarity to Tsh. RAST annotates this protein as EspC, while BASys annotates it as Hbp.
The number of GIs and integrated prophages indicate that FHI_NMBU_03 has obtained a high number of virulence factors by horizontal gene transfer and this may have been facilitated by a defect in the DNA-repair system with a frameshifted alkB gene. It is known that AlkB relevant lesions appear to represent strong blocks to replication, but these blocks can be bypassed by error-prone translesion DNA polymerases as a part of the SOS-system, leading to mutagenesis [43].
The o454-nlpD region was shown to consist of several genetic patterns, where pattern III (the FHI_NMBU_03 sequence contains pattern III) had significant associations with phylogenetic group B2 strains, representing the most virulent members of the ExPEC group. This o454-nlpD region pattern was proposed as a tool to identify highly extraintestinal virulent strains among a mixed population of E. coli [30].
Strains closely related to FHI_NMBU_03 may have caused disease in Norway for an extended period of time as nine aEPEC intimin eae-β2 carrying B2 strains of sequence type ST28 was previously detected among 56 aEPEC isolates from faecal specimens from children < 5 years old in Norway (five strains were from community-acquired diarrhoea samples) [44]. All nine strains where shown by microarray analysis to contain the ibeA, malX and usp genes as FHI_NMBU_03.
The high frequency of strains with combined IPEC/ ExPEC VAGs found in this study is worrisome as they might be capable of causing both intestinal-and extraintestinal disease. One scenario could be a general weakening of the immune system caused by ongoing intestinal disease, thereby creating an opportunity for spread of bacteria with ExPEC VAGs to other anatomical sites where the ExPEC VAGs may contribute to severe extraintestinal disease.

Conclusion
We report that a high frequency (> 93%) of routinely submitted faecal E. coli strains from Norwegian hospitals, previously characterized as IPEC, also harbour ExPEC virulence factors. Traditionally IPEC is regarded as a diarrhoeagenic pathogen with a set of virulence genes that is absent in ExPEC strains e.g. UPEC. This very high frequency of combined IPEC/ExPEC was an unexpected finding warranting further studies, as they may provide a rich source of opportunistic extraintestinal infections. WGS of one selected strain confirmed the pathovar-hybrid nature and revealed a genome heavily influenced by horizontal gene transfer (HGT). Sequence complex ST28 has previously been assigned to a hybrid group that was named "phylogroup ABD" [45], which supports our finding of the hybrid nature for strain FHI_NMBU_03.