Data sources
Since there are almost no dengue fever cases occurred during the winter in Taiwan, we chose May 1 as the beginning of the dengue epidemic season when estimating the cumulative epidemic curve. The data come from the dengue notification dataset containing suspect cases in Taiwan whose date of onset was from May 1, 2005 to April 30, 2007. All serum samples from suspect cases were sent to the two reference laboratories at the Taiwan CDC in order to further identify if they were positive (dengue fever infected) or negative cases. The reason we retrieved data based on the date of onset rather than the report date was to avoid the influence of lag reporting on the course of the disease. All imported cases of the disease were removed. The variables we used were the date of onset, the date of laboratory confirmation (diagnosis date), and the final confirmed status of each suspect case (a binary variable that is either positive or negative). In this article, we use confirmed dengue cases and positive cases interchangeably. No personal identification information was contained in the dataset.
There were 515 positive cases out of 841 local suspect cases during the 2005-2006 season and 1,092 positive cases out of 2,360 local suspect cases during the 2006-2007 season. The median values (interquartile-range; IQR) of the onset-to-confirmation time (OC-time) for positive and negative cases were 9.5 (11) and 20 (14) days respectively for the 2005-2006 season. The median values (IQR) of the OC-time for positive and negative cases were 7 (9) and 18 (7) days respectively for the 2006-2007 season. The OC-time for positive cases was, in general, shorter than that for the negative cases. The standard deviations of the OC-time for the positive cases and negative cases were also different. Figure 1 shows that the epidemic started in the late June of 2006, and had a peak around October to November, then had nearly vanished by February 2007, based on the final status of each case.
The proposed method
Confirmation status of the suspect cases
Our proposed method estimates the real-time daily new number of cases and the daily cumulative number of dengue cases; specifically, these numbers of dengue cases are updated daily. Let c be the "current" date when the number of dengue cases is to be estimated. In this study, the date c runs from May 1, 2006 to April 30, 2007. For the i
th reported suspect case counting from the 1st day of the epidemic season, that is May 1 in this study, we define the suspect case's onset date as O
i
and the laboratory confirmation date as D
i
If D
i
> c on date c, the case i does not have a confirmation result as of date c; on the other hand, if D
i
≤ c, this case i is either confirmed to be a positive dengue case or has a negative result as of date c. Let the final confirmation status for the i
th suspect case be, where as a positive dengue case, and as a negative case. In the situation where there are unconfirmed suspect cases as of date c, we assigned a probability of being a dengue case, P(i), to those unconfirmed cases (D ≤ c). Then for each suspect case i, the expected final confirmation status on date c, E
i
(c), can be written as
The values of P
i
(c), and E
i
(c)are updated for each case i every day. Without applying the proposed method, one would be only able to observe the case status from the upper part of E
i
(c)in equation (1). After E
i
(c)is calculated for each suspect case, daily new cases are easily estimated by summing the E
i
(c)over all new suspect cases on date c, and cumulative cases can be obtained by summing E
i
(c)over all cases from i = 1 to the newest suspect cases on date c.
Estimation of the probability of being dengue case, P
i
(c) among unconfirmed suspect cases
P
i
(c)is estimated for unconfirmed cases using information from the confirmed cases before date c within one year. Let T
i
be the onset-to-confirmation time (OC-time), the time interval between the onset date and the lab-confirmation date. The OC-time for the i
th suspect case as of date c, t
i
(c), is calculated as follows,
The t
i
(c) is the OC-time for confirmed cases and the censored OC-time for unconfirmed cases on date c.
By applying several steps of Bayes' rules, the probability P
i
(c) is given by:
To estimate P
i
(c) using the information as of date c, we applied the following steps. We first estimated P(Y
i
= 1) by calculating the proportion of confirmed positive dengue cases out of the suspect cases using the data with onset date within 1 year before the date c. Based on a parametric approach, we assumed the OC-time for a given case status, P(T|y
i
), follow a gamma distribution. Gamma distributions are frequently used to fit time-delay distributions or time event distributions when carrying out disease surveillance analysis [20, 21]. The probability density function of the gamma distribution is
, where
. The gamma distribution is denoted by with two parameters, the shape parameter α and the scale parameter β, and the mean and variance are αβ and αβ
2, respectively. The values of α and β were estimated and solved by setting up the sample mean and the sample variance of the OC-time equal to αβ and αβ
2, respectively. As mentioned in the previous section, the mean and standard deviation of the OC-time were different between positive and negative cases, we estimated different sets of α and β for the positive dengue cases (Y = 1) and negative cases (Y = 0) separately. A nonparametric approach was also performed in which the probability P(T > t
i
(c)|Y
i
) was simply replaced with the cumulative proportion of confirmed data given their final status. Both the parametric and nonparametric models were based on the data within a 1-year "moving window" before date c. The P
i
(c) and E
i
(c) were also updated everyday.
Evaluation of the proposed methods
To evaluate the performance of the proposed method, we estimated the daily new cases and daily cumulative cases for each calendar date c from May 1, 2006 to April 30, 2007. Four epidemic curves are presented. There are:
-
(1)
The final status curve, which is the number of dengue cases based on their final confirmation status ("Real data", "gold standard").
-
(2)
The daily confirmed curve, which is the number of dengue cases based on the confirmed cases as of date c.
-
(3)
The gamma-model curve, which is the number of dengue cases, estimated using the gamma distribution.
-
(4)
The nonparametric-model curve, which is the number of dengue cases, estimated using the nonparametric distribution.
To summarize the magnitude of the bias, we defined the absolute relative bias (ARB) at date c as:
where
are the cumulative cases estimated by the proposed methods or by the confirmed cases observed on date c without using the proposed methods, N
c
and are the cumulative confirmed cases based on the final status ("real data", "gold standard"). An ARB closer to zero is a more accurate estimate.
All analyses were performed using SAS 9.1.3 software (SAS Institute, Inc., Cary, NC). Special SAS macros for estimating the cumulative cases and daily new cases, based on our proposed model, were developed.