Details of the US Flu VE Network design, sites, and enrollment procedures have been described previously . Briefly, during the 2011–2012 influenza season, patients aged ≥6 months seeking outpatient medical care for an ARI with fever or cough were recruited at outpatient clinics in Marshfield, Wisconsin; southeastern Michigan (Ann Arbor and Detroit); Temple-Belton, Texas; Seattle, Washington; and Pittsburgh, Pennsylvania. Patients meeting the symptom criteria were eligible if duration of illness was ≤7 days and they had not received antiviral medication prior to enrollment. Recruitment and sample collection were performed by study personnel at each site and not influenced by the diagnosis of the treating physician. Consenting patients or their parents/guardians completed an enrollment interview to ascertain patient demographic characteristics, symptoms (fever, cough, fatigue, sore throat, nasal congestion, shortness of breath, wheezing), onset date, and subjective assessments of general health, current health status and self-reported influenza vaccination status.
Nasal and throat swabs (nasal only for children age <2 years) were collected and combined for influenza testing at network laboratories. This technique was selected because it is easier to collect and less uncomfortable for the patient and has been found to be as effective as nasopharyngeal swabs . Presence of influenza was tested using real-time reverse transcription polymerase chain reaction (PCR) as previously described . The parent study used a test-negative case-control design [18–20].
Selection of study sample
Individuals enrolled in 2011–2012 from all 5 sites during periods of influenza circulation at each site were included in the analyses. That is, influenza circulation at each site was determined to be the time between date on which the first influenza positive case was enrolled and the last influenza-positive case was enrolled. Although participants may have reported onset of symptoms before or after this period; they were excluded from analysis. The total sample for all sites was 5,147. Some individuals were enrolled multiple times; all of those visits except the first enrollment were excluded (N = 71) reducing the sample to 5,076. Because symptoms of influenza vary between young children and older individuals, the primary analysis sample was restricted to enrollees ≥5 years of age, resulting in a final sample size of 4,173. Secondary analyses included children <5 years of age.
Classification and Regression Trees (CART)  software was used to develop models that can classify subjects into various risk categories. Recursive partitioning, a non-parametric statistical method for multivariable data, uses a series of dichotomous splits, e.g., presence or absence of symptoms and other demographic variables, to create a decision tree, with the goal of correctly classifying members of the population, in this case, laboratory-confirmed influenza cases. Each independent variable is examined and a split is made to maximize the sensitivity and specificity of the classification, resulting in a decision tree. The objective of pruning is to develop a tree with the best size and lowest misclassification rate .
The CART method is able determine the complex interactions among variables in the final tree, in contrast to identifying and defining the interactions in a multivariable logistic regression model.
To begin the CART analysis, simple random sampling without replacement was used to split the sample into equal sized (50 %–50 %) developmental and validation samples. CART was applied first on a developmental sample then on a validation sample to assess the model’s generalizability and to evaluate the over fitting of the model to the developmental sample.
Several sets of candidate predictors were used to build the classification trees. Using several iterations, CART models were used to determine a clinically logical fit, based on sensitivity and specificity; the variables included those that were potentially related to risk of influenza such as, symptoms, self-reported vaccination status, personal and demographic variables and presence of chronic disease. The primary developmental and validation models were constructed for all participants ≥5 years, using self-reported vaccination status, household smoking status, and symptoms reported at enrollment: cough, fever, fatigue, wheezing, sore throat, nasal congestion, shortness of breath. The variables smoking status, age and presence of other high risk conditions were not included in this model.
The Gini Index method was used to split off the largest category into a separate group, with the default split size set to enable growing the tree. When the final tree was built, the tree was pruned, deleting the variables that did not further classify subjects, based on the variable importance score and the sensitivity, into an influenza group or no influenza group. Once a clinically meaningful structure on the CART evolved, pruning was discontinued. Hosmer-Lemeshow goodness of fit test confirmed the suitability of the trees.
Secondary analyses were constructed that included: 1) children 6–59 months of age, presenting within 2 days of onset of symptoms and included PCR-confirmed influenza status, self-reported vaccination status, household smoking status, and symptoms reported at enrollment: cough, fever, fatigue, wheezing, sore throat, nasal congestion, shortness of breath; and 2) adults ≥65 years old and individuals 5–64 years old with a high risk condition, presenting within 2 days of onset of symptoms and included PCR-confirmed influenza status, self-reported vaccination status, household smoking status, symptoms reported at enrollment: cough, fever, fatigue, wheezing, sore throat, nasal congestion, shortness of breath, and asthma diagnosis.
Receiver Operating Characteristics (ROC) curves and the area under the curve (AUC), sensitivity, specificity, positive and negative predictive values which were estimated using CART software were used to assess the performance of the CART model for the developmental and validation samples. The sensitivity from the CART model was determined using the final influenza positive terminal node and specificity was determined using the previous influenza negative terminal nodes.
In addition to the CART analyses, descriptive statistics were calculated as percentages for discrete variables and as means and standard deviations for continuous variables. Chi-square statistics were used to compare the distribution of symptoms and other discrete measures and Student’s t-tests were used to compare the continuous measures (i.e., age) between those with and without laboratory-confirmed influenza.
To support the CART findings, sensitivity analyses were conducted using multivariable regression analyses with a full model method, using the same set of variables used in the CART analysis for both developmental and validation samples, and for the full sample with all individuals ≥5 years of age. Positive and negative predictive values were calculated using sensitivity and specificity values from the CART model across a hypothetical range of influenza prevalence values (1–40 %) to reflect influenza seasons of varying severity (Table 3). The sensitivity and specificity, calculated using the predicted probability from the multivariable logistic regression for both developmental, validation and the full sample with the true classification of influenza, were obtained and are presented.
Statistical significance was defined as a two-sided p value <0.05. Data were analyzed using SAS v9.2 (SAS Institute, Inc., Cary, NC) and CART for the decision trees (Predictive Modeler) Software version 184.108.40.2060 (Salford Systems, San Diego, CA).