Molecular characterization of human Echinococcus isolates and the first report of E. canadensis (G6/G7) and E. multilocularis from the Punjab Province of Pakistan using sequence analysis

Background Echinococcosis is a zoonotic parasitic disease causing serious health problems in both humans and animals in different endemic regions across the world. There are two different forms of human echinococcosis: Cystic Echinococcosis (CE) and Alveolar Echinococcosis (AE). CE is caused by the larval stage of Echinococcus granulosus sensu lato and AE by the larval stage of Echinococcus multilocularis. Geographically, CE is universally distributed, while AE is prevalent in the northern hemisphere. Although the disease is endemic in neighboring countries (China, Iran and India) of Pakistan, there are limited reports from that country. Besides, there are no comprehensive data on the genotyping of Echinococcus species in humans based on sequence analysis. This study aimed to detect the presence of human CE and to identify Echinococcus spp. in human isolates through genetic characterization of hydatid cysts in the Punjab Province of Pakistan. Methods Genetic analysis was performed on 38 human hydatid cyst samples collected from patients with echinococcosis using mitochondrial cytochrome c oxidase subunit 1 (cox1), cytochrome b (cytb) and NADH subunit 1 (nad1). Patient data including age, epidemiological history, sex, and location were obtained from hospital records. Results According to the sequence analysis we detected E. granulosus sensu stricto (n = 35), E. canadensis (G6/G7) (n = 2), and E. multilocularis (n = 1). Thus, the majority of the patients (92.1%, 35/38) were infected with E. granulosus s.s. This is the first molecular confirmation of E. canadensis (G6/G7) and E. multilocularis in human subjects from Pakistan. Conclusions These findings suggested that E. granulosus s.s. is the dominant species in humans in Pakistan. In addition, E. canadensis (G6/G7) and E. multilocularis are circulating in the country. Further studies are required to explore the genetic diversity in both humans and livestock.


Background
Echinococcosis is a zoonotic disease caused by tapeworm parasites belonging to the Echinococcus genus. There are two main types of echinococcosis: Cystic Echinococcosis (CE) caused by Echinococcus granulosus sensu lato and Alveolar Echinococcosis (AE) caused by E. multilocularis. Additionally, polycystic echinococcosis (caused by E. vogeli and E. oligarthra) also occurs predominantly in South America [1]. CE has a worldwide geographical distribution [2]. Echinococcosis disrupts the economies of many countries, affecting approximately 2-3 million people [3]. The estimated human burden of CE was below 1 million disability adjusted life years (DALYs), but may increase above this figure. Annual losses caused by CE might reach 20 million US dollars [4]. The disease has a prevalence of about 1/100,000 in developed countries and can reach 200/100,000 in rural populations having close contact with domestic dogs [3].
Most species of Echinococcus inhabit domestic and wild mammals. The definitive hosts include both domesticated dogs and wild carnivore species (foxes, wolves and coyotes). Humans and livestock act as intermediate hosts. Livestock animals are the intermediate hosts for E. granulosus s.s., while wild small mammals serve for E. multilocularis. Humans acquire infection with CE by accidental ingestion of parasite eggs in the contaminated food and water, or by direct interaction with the definitive hosts [5]. Hatching of eggs occurs in small intestine and then the developing parasite larvae can spread to any other organ; however, they prefer to reside in the liver, where parasite forms the hydatid cysts [6]. Molecular genotyping has shown that members of E. granulosus s.l. include E. granulosus s.s. (G1-G3), E. equinus (G4), E. ortleppi (G5), E. canadensis (G6/7, and G8-10) and E. felidis [2].
There are limited data about CE in Pakistan, whereas the incidence of the disease is high in neighboring countries such as Iran, India, and China, for which published data are available for both prevalence and genotyping. Limited research has been conducted on echinococcosis in the past decade in Pakistan [7]. Previous investigations of Pakistani isolates showed the incidence of E. granulosus s.s. (G1-G3) in cattle, buffalo and sheep, while in humans, E. granulosus s.s. (G1-G3) and E. canadensis (G6/7) were detected based on data using the cox1 gene sequences [8][9][10]. However, interestingly, E. multilocularis was reported in cattle from Pakistan [10,11]. In the past, E. granulosus s.s. was reported in livestock (e.g.,cattle), while E. granulosus s.s. and the former G6 genotype were reported by using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) analysis without confirmation by gene sequencing in humans in Khyber Pakhtunkhwa (KPK) province of Pakistan [11].
Therefore, in the current study, hydatid cyst samples collected from human patients with echinococcosis and stored as formalin-fixed paraffin-embedded (FFPE) tissues were subjected to sequence analysis by analyzing mitochondrial cytochrome c oxidase subunit 1 (cox1), cytochrome b (cytb), and NADH subunit 1 (nad1) genes, to investigate the possible genetic diversity in the hydatid cyst samples in Pakistan.

Geography of the study area
Punjab is one of the largest provinces by population, with fertile agricultural land and deserts in the southern part near the border with Rajasthan and near the Sulaiman Range. Punjab comprises parts of the Cholistan and Thal deserts. It has extreme weather, with foggy and wet winters. The average temperature increases from mid-February, with springtime weather continuing until mid-April, when the summer heat sets in June and July are very hot months.

Collection of samples
The formalin fixed and paraffin embedded hydatid cyst samples were collected from the Pathology Departments of the contributing hospitals from 2012 to 2017. All patients were confirmed as being infected with echinococcosis (CE, n = 37; AE, n = 1) by histopathological investigation after surgery (detection of Periodic Acid-Schiff (PAS)-positive laminated layers, and or protoscoleces and/or hooklets). Patient data, including age, sex, and epidemiological history were recorded.

Molecular analysis Genomic DNA extraction
The genomic DNA (gDNA) isolation was performed using individual Formalin-fixed paraffin-embedded (FFPE) tissues. Sections of 10-15 μm thickness were taken from each cyst by using astndard microtome (Leica SM2000 R Sliding Microtome,Wetzlar, Germany) with disposable DNA-RNA free blades. Equipment was autoclaved or sanitized before use. Paraffin was removed by incubation in 1 ml of xylene for 10 min at 37°C. The supernatant was discarded after centrifugation at 12000×g for 5 min. Samples were rehydrated in descending ethanol concentrations; excess ethanol was evaporated at room temperature. A genomic DNA isolation kit (TRANS Easy Pure FFPE Tissue Genomic DNA Kit, Code: EE191-01; Transgen biotech, Beijing, China) was used to extract total gDNA according to the manufacturer's protocol, with a few modifications. Briefly, the tissue samples were digested at 56°C overnight in lysis buffer (400 μl), and then the gDNA was extracted. Sterile distilled water (100 μl) was used to resuspend the pellet.
The gDNA samples were stored at − 20°C until further use [12].

PCR amplification and sequencing
The mitochondrial genes (cox1, nad1, and cytb) were amplified from the isolated gDNA as previously described [12,13]. Amplification of the cox1 (446 bp) and cytb genes (580 bp) was carried out using a thermocycler with the following PCR conditions: denaturation was done at 94°C for 30 s, annealing at 54°C for 30 s and extension at 72°C for 60 s, for 35 cycles. Amplification of the nad1 gene (900 bp) was performed with the following PCR conditions: denaturation at 95°C for 60 s, annealing was done at 50°C for 50 s and extension at 72°C for 70 s, for 30 cycles [14]. All PCR amplifications were performed with a negative control comprising sterile distilled water instead of the DNA template. The PCR products were visualized using a gel doc system after separation through a 1.5% agarose gel. All positive samples PCR products were subjected to sequence analysis.

Phylogenetic analysis
Construction of the phylogenetic tree, multiple sequence alignment, and unidirectional DNA sequence analysis were constructed using Mega X [15]. A maximum composite likelihood (MCL) strategy was applied to construct the initial trees, using a heuristic search with the BioNJ algorithms and neighbor-joining approach. The superior log-likelihood value was applied to select the topology [70]. The reference sequences that were used as outgroups in the phylogeny and in tree construction are shown in Table 1.

Statistical analysis
Data was analyzed for statistical analysis by using Fisher's exact test.

Results
In the present study, 38 human hydatid cyst samples were collected from surgically confirmed patients with echinococcosis, from different areas of Punjab, Pakistan. The average age of the patients with CE was 32.73 (ranging from 5 to 75 years). The demographic characteristics of infected cases are summarized in Table 2.
Among the 38 human echinococcosis samples analyzed, 22 were from males (57.8%) and 16 (42.2%) were from females and the differences were not statistically significant (χ 2 = 1.89, df = 1, P > 0.05). A larger proportion (76.3%) of echinococcosis cases was reported from rural areas, which have closer contact or association with dogs compared with that in urban areas (23.7%). The liver (50%) was most affected organ, followed by the lungs (22.5%), and others (Table 2).

Genetic characterization of Echinococcus isolates
PCR amplification of the cox1 gene yielded a product of 446 bp, while cytb yielded a 580 bp fragment, and nad1 yielded a 900 bp product in all samples. The nucleotide sequences of all Pakistani samples (n = 38) were BLAST searched against reference sequences retrieved from GenBank. According to the BLAST analysis of the sequences of the cox1, cytb, and nad1 genes, E. granulosus s.s. (n = 35), E. canadensis (G6/G7) (n = 2), and E. multilocularis (n = 1) were detected. The findings showed that majority of the patients (35/38) were infected with E. granulosus s.s. All sequences have been published in GeneBank (acession no: MK229294-MK229342).
E. granulosus s.s. and E. canadensis (G6/G7) were characterized by using the sequences of cox1 (446 bp), cytb (580 bp), and nad1 (900 bp). Each sample was characterized by using the sequence at least one of them while E. multilocularis was identified by using only cytb (580 bp).

Alignment results of the sequences
In the sequence comparison, the cox1 gene showed 100% match with E. granulosus s.s. except for isolates PUN-23 and PUN-91 were identified as E. canadensis (G6/G7) (Fig. 1a).
The cytb gene sequences matched with the selected reference gene sequences of E. granulosus s.s. However, only PUN-91 was identical with the E. canadensis (G6/ G7) reference gene sequence (Fig. 1b).
For the nad1 gene, while the PUN-91 sequence matched with E. canadensis (G6/G7), all the other sequences were detected as E. granulosus s.s. after BLAST analysis (Fig. 1c).
For the nad1 gene, PUN-131-Pakistan was conserved when compared with the selected genotypes, whereas point mutations and substitutions were found in some of the other compared sequences (Fig. 3). The PUN-116-Pakistan sample had point mutations reported from France (GenBank: KY766893) and Turkey, while other sequences of E. granulosus (s.s.) were conserved (Fig. 4).

Discussion
The two notable cestode-borne zoonoses are CE and AE. In the northern hemisphere, AE is widely distributed, while CE is widely distributed across the world and the disease burden in humans is highly variable in different endemic areas. AE and CE are still considered as neglected zoonoses in many areas of the world, although their prevalence is quite high in such areas, because of lack of awareness and disease management. The occurrence of CE is quite high around the world; however, the pathogenicity and fatality caused by AE is more prevalent in Asia [33]. CE is an endemic disease in Pakistan and causes serious economic losses in terms of human healthcare and livestock agriculture costs. In addition, there is lack of knowledge about CE in Pakistan that affects its transmission dynamics [34][35][36]. Agriculture is the backbone of the Pakistan and a large number of families are affiliated with this sector, including animal rearing and dairy farming for milk products. In small and domestic farms, standard principles are often not strictly followed; therefore, these populations are at high risk of acquiring Echinococcus spp. infection [7]. CE is considered a socially constructed disease because of various traditional practices found among different ethnic groups around the globe, such as keeping many dogs and a large amount of livestock, and the culture of rescuing stray dogs [37].
In current investigation, a total of 35 hydatid cyst samples were characterized as resulting from E. granulosus s.s. A high rate of E. granulosus s.s. was detected, which is in line with the data reported previously in humans (88.5%) [38] and livestock [39]. Similarly, in China, the majority (60%) of CE positive cases in humans are caused by E. granulosus s.s. (formerly the G1 strain) [40], which also caused 40.62% of the infections reported in India [41]. However, there is little information on the genetic characterization of Echinococcus spp. in humans in Pakistan. Echinococcus granulosus s.s. has been reported in buffaloes in Sindh Province of Pakistan [8]. This species was detected in small and large ruminants, while the sheep strain (G1) was found in human samples (n = 2) using cox1 gene sequencing [9]. Echinococcus granulosus s.s. in cattle has been reported in Pakistan [10]. The high rate of E. granulosus s.s. reported in current study might be because E. granulosus s.s. is the predominant species in Pakistan (so far) and in neighbouring countries [9,39,40]. Even globally, E. granulosus s.s. is the most predominant causitive agent of CE [38]. It has a wide host range, which makes it more dominant in endemic localities even in cases where it occurs in sympatry with other E. granulosus s.l. In addition, it might reflect the fact that the maximum number of cases with CE were inhabiting in rural areas, where people have a close association with dogs [41].
In the present study, two samples were characterized as being infected with E. canadensis (G6/G7). E. canadensis (G6/7) was thought to be less infective to humans [42]. It is now known to be the second most important causative agent of CE after E. granulosus s.s [38]. Globally, E. canadensis (G6/7) has been reported in Kenya [42,43]; Argentina [44]; China [40]; in different parts of Africa, Asia, and South America [12,27,45]; and in many countries in eastern and south-eastern Europe [27,46,47]. Meanwhile, the G6-G10 cluster was reported in Northern Palearctic, Northern Africa, and in the Middle East [27]. In Pakistan, because of the camel and pig populations, G6 transmission to human hosts is possible, especially resulting from camel slaughtering and cross boundary migration of animals from Afghanistan. The characterization of the E. canadensis (G6/G7) in humans in Pakistan suggests the interaction between the cameldog and pig-dog cycles. In Pakistan, the pig population is abundant and imposes a serious health threat to the human population. Often, wild pigs live near human settlements in Pakistan. Although the camel population is quite low in the Punjab Province of Pakistan, sharing a border with Afghanistan and Iran means that species transmission is possible because of illegal animal  In the present study, one sample was characterized as being infected with E. multilocularis. In North America, human cases of AE caused by E. multilocularis have been reported [48,49]. AE is also prevalent in the northern hemisphere and even in the neighbouring country of Afghanistan [33]. In Pakistan echinococoosis ingeglected yet [50,51].
The current study is first report of genotyping of E. multilocularis from humans in Punjab Province, Pakistan using sequence analysis. Previously, E. multilocularis was investigated in cattle from the KPK province of Pakistan as assessed using PCR-RFP [11]. The present findings suggest that cystic echinococcosis is an important emerging health issue and that AE is circulating in rural areas of Pakistan.

Conclusions
In conclusion, the current findings indicate the presence of E. granulosus s.s., E. canadensis (G6/G7), and E. multilocularis in the Punjab province of Pakistan. Additionally, E. canadensis (G6/G7) in human isolates is reported Fig. 3 Multiple sequence alignments of partial cytb gene sequences (E. multilocularis). Genotypes (G1, G3, G5 and G6) represented with PUN suffix were from this study while reference sequences from GenBank of genotypes G1, G3, G5 and G6 are presented with different codes. The accession numbers range from MK229294 to MK229342 Fig. 4 Multiple sequence alignments of partial nad1 gene sequences. Genotypes (G1, G3, G5 and G6), represented with PUN suffixes, were from this study, while reference sequences from GenBank of genotypes G1, G3, G5 and G6 are presented with different codes. The accession numbers range from MK229294 to MK229342 for the first time in Pakistan. To aid the eradication of the disease, comprehensive surveillance should be initiated. Control measures developed based on surveillance results could help to slow down the spread of the disease. The probable occurrence of other E. granulosus s.l. species indicate that further epidemiological studies using more Echinococcus isolates from all intermediate hosts (e.g. human and others), as well as definitive hosts, should be performed in different climatic regions of Pakistan.