### Data collection

Hubei Province, located in the center of China and north of the Dongting Lake, has a population of more than 58 million. Hubei is known as the “province of thousands of lakes”. Except the main stream of the Yangtze River and Han River, there are 4228 rivers and 755 lakes in the province. The large water areas provide an easy spread of *Shigella spp*. Therefore, it is of great value to study shigellosis transmissibility in Hubei Province, China.

In this study, we used the dataset of reported shigellosis cases that was built from January 2005 to December 2017 in the province. The sex, age, illness onset date, and diagnosis date of each case was included in the data. Incidences of 12 urban areas (including 7 central districts in Wuhan City and 5 districts in Yichang City) and 14 rural areas (including 5 outer suburbs districts in Wuhan City, and 8 counties or county-level cities in Yichang City) from 2005 to 2017 were collected to compare the difference between urban and rural areas. All the cases were collected from the China Information System for Disease Control and Prevention. All the cases were identified following the case definitions from “Diagnostic criteria for bacterial and amoebic dysentery (WS287-2008)” announced by the National Health Commission of the People’s Republic of China:

- A)
Suspected cases: Diarrhea, with purulent or mucous stool or watery stool or sparse stool, accompanied by severe symptoms after internal emergencies, has not yet been identified form other causes of diarrhea.

- B)
Clinically diagnosed cases: A patient who has the following 4 items: 1) unclean diet and/or contact history with dysentery patients; 2) sudden onset, chills and high fever, followed by abdominal pain, diarrhea and internal urgency, severe defecation 10 to 20 times a day, but not much, purulent stool, and moderate systemic poisoning symptoms; or severe symptoms like convulsions, headache, systemic muscle ache, dehydration and electrolyte disorders, may have left lower abdominal tenderness with bowel ringing hyperactivity; 3) routine stool examination showed that white blood cells or pus cells > = 15/HPF (400 folds), erythrocyte and phagocyte could be seen; 4) excluding diarrhea caused by other causes.

- C)
Confirmed cases: *Shigella spp* was tested positive by fecal culture from clinical diagnosis of cases.

In this study, we only included clinically diagnosed cases and confirmed cases for the analysis.

### The transmission models

In this study, two models were developed according to the transmission routes of the disease. According to our previous research [14, 19], we firstly built a susceptible-exposed-symptomatic-asymptomatic-recovered-water/food (SEIARW) model (denoted as Model 1) in which two routes (person–to–person and reservoir–to–person) were considered. For the route of person–to–person, two routes (symptomatic–to–susceptible and asymptomatic–to–susceptible) were both considered. In the model, people were divided into susceptible (*S*), exposed (*E*), symptomatic (*I*), asymptomatic (*A*), and recovered (*R*) individuals. The reservoir, including water or food, was denoted as *W*. In order to normalize the dimensions of people and *Shigella spp* population, we set *s* = *S*/*N*, *e* = *E*/*N*, *i* = *I*/*N*, *a* = *I*/*A*, *r* = *R*/*N*, *w* = *εW*/*μN*, *b* = *βN*, and *b*_{w} =* μβ*_{W}*N*/*ε*.

In the model, *N* is assumed to denote the total population. The parameter *β* is the transmission relative rate from person to person, *β*_{W} is the transmission relative rate from reservoir to person, *k* is the relative transmissibility of asymptomatic to symptomatic individuals, *ω* is the incubation relative rate, *p* is the proportion of asymptomatic individuals, *γ* is the infectious period relative rate of symptomatic individuals, *γ’* is the infectious period relative rate of asymptomatic individuals, *ε* is the relative rate of the pathogen’s lifetime, *c* is the shedding rate of the asymptomatic comparing to the infectious, and *μ* is the pathogen shedding coefficient of infectious individuals. The equations of the model are as follows:

$$ \frac{ds}{dt}=- bs\left(i+ ka\right)-{b}_w sw $$

$$ \frac{de}{dt}= bs\left(i+ ka\right)+{b}_w sw-\omega e $$

$$ \frac{di}{dt}=\left(1-p\right) we-\gamma i $$

$$ \frac{da}{dt}= pwe-{\gamma}^{\prime }a $$

$$ \frac{dr}{dt}=\gamma i+{\gamma}^{\prime }a $$

$$ \frac{dw}{dt}=\varepsilon \left(i+ ca-w\right) $$

We also developed a susceptible-exposed-symptomatic-asymptomatic-recovered (SEIAR) model (denoted as Model 2) which only includes the transmission route of person–to–person [21,22,23]. The equations of the model are as follows:

$$ \frac{ds}{dt}=- bs\left(i+ ka\right) $$

$$ \frac{de}{dt}= bs\left(i+\mathrm{k}a\right)-\omega e $$

$$ \frac{di}{dt}=\left(1-p\right) we-\gamma i $$

$$ \frac{da}{dt}= pwe-{\gamma}^{\prime }a $$

$$ \frac{dr}{dt}=\gamma i+{\gamma}^{\prime }a $$

In the model, the effective reproduction number (*R*_{eff}) was calculated by the equation as follows:

$$ {R}_{eff}= bs\left(\frac{1-p}{\gamma }+\frac{kp}{\gamma^{\prime }}\right) $$

### Parameter estimation

There are eight parameters (*b*, *b*_{w}, *k*, *ω*, *p*, *γ*, *γ’*, *c*, and *ε*) in the above models. According to our previous research [19], *k*, *ω*, *p*, *γ*, *γ’*, *c*, and *ε* are disease-specific parameters which could be estimated from literatures. The incubation period of shigellosis is 1–4 days [24, 25], and most commonly 1 day, therefore, *ω* = 1.0. The proportion of asymptomatic infection ranges from 0.0037 to 0.27 [26,27,28], and it can be set *p* = 0.1. The infectious period of symptomatic infection is 13.5 days [19], therefore, *γ* = 0.0741. According to our previous research [19], the infectious period of asymptomatic infection could be simulated 5 weeks in our model, thus *γ’* = 0.0286. Because of that the die-off rate of *Shigella spp* is about 24.5 h [29], *ε* = 0.6931 was consequently simulated [19]. In our previous research [19], a typical case has diarrhea about 3.2 times (range 3–12 times) per day but an asymptomatic individual only sheds stool once per day, thus *c* = 0.3125. Due to reduction of shedding frequency, the relative transmissibility of asymptomatic individual (*k*) was modeled to be a reduced quantity as 0.3125 [19].

However, *b* and *b*_{w} are scenario- or area-specific parameters which depend on different outbreaks and periods. Therefore, these two parameters are confirmed by curve fitting by Models 1 and 2 to the collected data. To simulate the contribution of *b* and *b*_{w} during the transmission, we performed a “knock-out” simulation which was based on the following four scenarios: A) *b* = 0 and *b*_{w} = 0; B) *b* = 0; C) *b*_{w} = 0; D) control (no intervention).

### Simulation method and statistical analysis

Berkeley Madonna 8.3.18 (developed by Robert Macey and George Oster of the University of California at Berkeley. Copyright©1993–2001 Robert I. Macey & George F. Oster) was employed for model simulation and least root mean square was adopted to assess the goodness of fit. The simulation methods were the same as the previously publications [12, 14, 19, 30, 31]. The chi-square and trend chi-square tests and *t* test were performed by SPSS 13.0 (IBM Corp., Armonk, NY, USA).

Eleven equations (Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic) were employed to fit the data of *R*_{eff} which was calculated by the reported data. The equations of the 11 models were shown as follows:

$$ \mathrm{Linear}:f(x)={b}_0+{b}_1x $$

$$ \mathrm{Logarithmic}:f(x)={b}_0+{b}_1\ln (x) $$

$$ \mathrm{Inverse}:f(x)={b}_0+\frac{b_1}{x} $$

$$ \mathrm{Quadratic}:f(x)={b}_0+{b}_1x+{b}_2{x}^2 $$

$$ \mathrm{Cubic}:f(x)={b}_0+{b}_1x+{b}_2{x}^2+{b}_3{x}^3 $$

$$ \mathrm{Compound}:f(x)={b}_0+{b}_1^x $$

$$ \mathrm{Power}:f(x)={b}_0+{x}^{b_1} $$

$$ \mathrm{S}:f(x)={e}^{\left({b}_0+\frac{b_1}{x}\right)} $$

$$ \mathrm{Growth}:f(x)={e}^{\left({b}_0+{b}_1x\right)} $$

$$ \mathrm{Exponential}:f(x)={b}_0{e}^{b_1x} $$

$$ \mathrm{Logistic}:f(x)=\frac{1}{\frac{1}{u}+{b}_0+{b}_1^x} $$

In the equations, *x* and *f*(*x*) refer to time (year) and dependent variables (*R*_{eff}), respectively; *b*_{0}, *b*_{1}, *b*_{2}, *b*_{3}, and *u* refer to the coefficients of the models which were estimated by curve fitting with the data. Determination coefficient (*R*^{2}) was employed to evaluate the curve fitting.