High heterogeneity in community-acquired pneumonia inclusion criteria: does this impact on the validity of the results of randomized controlled trials?

Background There is no consensus on the most accurate combination of diagnostic criteria to define community acquired pneumonia (CAP). We describe inclusion criteria in randomized controlled trials (RCT) of CAP and assess their performance for the diagnosis of formally identified CAP. Methods RCTs related to CAP recorded on ClinicalTrials.gov were analysed. Due to high heterogeneity, we divided close CAP inclusion criteria into patterns (i.e. combinations of inclusion criteria). To assess their diagnostic performances, these CAP definition patterns were applied to a reference population of 319 suspected CAP patients, in whom the CAP diagnosis had been confirmed (n = 163) or excluded (n = 156) by an adjudication committee after a systematic thoracic CT-scan and a 28-day follow-up period. Results In the 47 RCTs included in the analysis, 42 different CAP inclusion criteria combinations were identified and 8 patterns created. This heterogeneity was not explained either by the trials’ methodology or by their objectives. When applied to the reference population, the performance ranges of the 8 definition patterns were 9.8–56.4% for sensitivities, 56.4 97.4% for specificities, 63.6 83.6% for positive predictive values and 50.8–66.7% for negative predictive values. None of the CAP definitions had both sensitivity and specificity superior to 65%. Depending on the CAP definition, the rate of included patients without CAP (“false positives”) ranged from 1 to 21%. Conclusions CAP diagnostic criteria within RCTs are heterogeneous, which may have far-reaching consequences on validity of RCT results.


Background
Community-acquired pneumonia (CAP) is a leading cause of morbidity and the first cause of mortality by infectious disease in the European region. Approximately 3 million cases of CAP are reported annually in Europe, of which one-third are hospitalized [1]. Infectious pneumonia corresponds to the multiplication of microorganisms in alveoli responsible for a combination of non-specific pulmonary and general symptoms, with numerous alternative diagnoses [2]. Due to the deep localization of the infection, microbiological data, which may help to establish a diagnosis, is reported in less than 50% of patients and the diagnosis and aetiology frequently remains uncertain [3]. Thoracic-CT scan, which has proved its efficacy in confirming or invalidating the diagnosis of CAP, is probably to date the best non-invasive tool to confirm the diagnosis of CAP [4][5][6]. However, its cost and limited availability compared to chest X-ray prevent its widespread use as a diagnostic criterion.
Due to heterogeneity of clinical presentation, there are to date neither universal diagnostic criteria to define CAP, nor a validated diagnostic classification. Consequently, CAP definitions differ according to country, medical specialty and practice guidelines [7][8][9]. However, the lack of universal CAP diagnostic criteria might have consequences for clinical practice, epidemiological analyses, and also validity of randomized controlled trials (RCTs). In this context, a true and powerful evaluation of an intervention obviously requires the inclusion of a sufficient number of individuals truly presenting the targeted disease and representative of the entire infected population. This is even more true in non-inferiority trials, a frequently-used methodology in CAP RCTs [10], in which the inclusion of patients without the targeted disease might result in inaccurate estimation of the difference between arms of studies and an incorrect conclusion of non-inferiority of the evaluated intervention. Therefore, an assessment of existing diagnostic criteria for CAP is essential for clinical research.
In the present study, our principal objective was to assess the heterogeneity of CAP diagnostic criteria used in RCTs and to identify different patterns of CAP inclusion criteria. Furthermore, from the results of a previouslyconducted study [4], establishing the diagnosis of CAP based on all currently available data and also on early systematic thoracic CT-scan, we assessed performance (i.e. sensitivity, specificity, positive predictive value and negative predictive value, likelihood ratios) of different CAP inclusion criteria patterns when applied to our reference population [4].

Methods
Selection and data extraction from randomized controlled trials We searched ClinicalTrials.gov using the key words « community-acquired pneumonia » (last connection on October 1st 2018). The eligible trials were RCTs including adults with CAP independent of the trials' primary objective. Trials including a paediatric population, severe CAP and those withdrawn before enrolment were excluded.
The following data were extracted from the Clinical-Trials.gov website: Clinical Trial Number (NCT); declaration, start and completion dates; type of sponsor (industrial or academic); study design; primary objective of the study; extensive list of inclusion criteria (including the CAP diagnostic criteria); presence of a CAP severity score (Pneumonia Severity Index or CURB 65) and the list of exclusion criteria. The study design included the number of centres, national or international recruitment, the location of inclusion (community or hospital), the blinding method (simple/double-blind or open-label trial), the superiority or non-inferiority design, and the field of the study (evaluation of diagnostic criteria, evaluation of biomarkers, choice or duration of antibiotic treatment). Data collection was performed independently by two investigators (JLB and CF). Disagreements were resolved by consensus.
We combined different methods to collect CAP diagnostic criteria. First, we collected data available on the ClinicalTrials.gov declaration. We also systematically contacted the investigators by email to confirm and complete their CAP diagnostic criteria. A second e-mail was sent 1 month later to the non-responders. Finally, when CAP diagnostic criteria were not available, we searched PubMed for publications and looked for CAP diagnostic criteria in the full article, when available.
For each trial, we specified which criteria were mandatory or optional to establish the CAP diagnosis, and their combination. As there were numerous combinations of mandatory and optional criteria to define CAP (and therefore to include CAP patients), we gathered CAP definitions with quite similar criteria. In this report, we will use the term "CAP definition patterns" to refer to these different patterns of CAP inclusion criteria.
Performance of CAP definition patterns to properly identify patients with CAP in a reference population Reference population To assess the performances of each CAP definition pattern, the different CAP definition patterns were applied to the "ESCAPED" database. ESCAPED is a prospective, multicenter, interventional study which assessed the impact of a systematic thoracic CT-scan on the diagnosis of CAP in patients visiting the emergency department for a suspected CAP [4]. Patients were included in the database if 1) they presented at least one symptom of systemic infection (temperature > 38°C or < 36°C, heart rate > 90/min, respiratory rate > 20/min) AND 2) one recent respiratory symptom (cough, lateral chest pain, sputum, dyspnea, localized crackles) AND 3) had had a chest radiography AND 4) the clinician suspected CAP. The inclusion criteria did not include chest X-ray infiltrate, which made possible the inclusion of true CAP without infiltrate on chest X-ray, for which pulmonary involvement was secondarily established by thoracic CT-scan.
For each of the 319 included patients, an adjudication committee composed of a senior specialist in radiology, pneumonology and infectious diseases established the diagnostic probabilities of CAP according to a 4-level Likert scale (definite; probable; possible; excluded) from all available data at day 28 post-diagnosis. Based on the evaluation of the adjudication committee, there were 150 definite CAP cases, 13 probable CAP cases, 34 possible CAP cases, and 122 excluded CAP cases [4]. For the present study, we grouped definite and probable CAP in a single category and considered 163 patients as having CAP, compared to 156 who were considered as not having CAP.

Performances of CAP definition patterns
The comprehensiveness of clinical, biological and radiological data collected in the ESCAPED database, including all of the inclusion criteria used in the RCTs identified in the literature, allowed us to apply each CAP definition pattern to this database. For each CAP definition pattern, we aimed to determine if it accurately identified patients with CAP from the reference population of the ESCAPED database. We determined, among the 319 patients of the ESCAPED database, the number of patients who would have been included or excluded in the RCT, according to each CAP definition pattern. For each patient, we already knew if he was considered as having "definite or probable CAP", and "excluded or possible CAP" by the adjudication committee.
This resulted, for each CAP definition pattern evaluated, in a number of "true positives" patients (i.e. considered as having CAP by the evaluated CAP definition pattern and by the adjudication committee), a number of "false positive" patients (i.e. considered as having CAP by the evaluated CAP definition pattern but not by the adjudication committee), a number of "true negative" patients (i.e. considered as not having CAP by the evaluated CAP definition pattern and by the adjudication committee) and a number of "false negative" patients (i.e. considered as not having CAP by the evaluated CAP definition pattern but as having CAP by the adjudication committee).

Statistical analysis
For each CAP definition pattern, after determination of the number of "true positive", "true negative", "false positive" and "false negative" patients, we calculated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and positive and negative likelihood ratios for the diagnostic of CAP. All tests were two-sided, and p-values below 0.05 were considered to

Randomized controlled trials selection and characteristics
The search yielded 263 trials, 77 of which met the inclusion criteria for the present study. Reasons for exclusion are shown in Fig. 1. The full diagnostic criteria for CAP were obtained in 47 out of the 77 (61%) trials; these 47 trials were included in the analysis: in 32/47 trials (68%), diagnostic criteria were correctly described in Clinical-Trials.gov database; in 8 (17%) they were obtained after contacting the investigator, and in 7 trials (15%), they were obtained after analysis of the published articles.
The 47 trials had been registered on ClinicalTrials.gov between 2002 and 2018. Seventeen trials (36%) had published results. The sponsor was industry in 64% of cases and academic in 36%. Most trials were double-blinded (74%) and the type of trial was specified in 58% of cases (non-inferiority 49%, superiority 9%). The objective of the trial was the evaluation of antibiotic choice in 32 (68%), of different antibiotic durations in 5 (11%), of other therapeutic strategies in 9 (19%), and of biomarkers in 3 (6%); some studies combined two of these objectives (Table 1).

Diagnostic criteria for CAP
The search identified 42 different combinations of CAP diagnostic criteria in the 47 trials. In 44 out of the 47 trials (94%), a pulmonary infiltrate compatible with pneumonia on chest X-ray was mandatory for CAP diagnosis, while it had to be new in 38 trials. In one trial, it was an optional diagnostic criterion. In the two remaining trials, the chest X-ray result was not a diagnostic criterion.
The list and the combinations of the criteria were highly heterogeneous. Table 2 presents the frequency of each criterion (optional or mandatory) in the 47 trials. Fever was mentioned as an inclusion criterion in 98% of trials but was mandatory in 19%. At least one biological criterion was mentioned in 64% of trials and mandatory in 6%: mostly leukocytosis or C-reactive protein. According to the list of criteria and their 42 different combinations, eight CAP definition patterns were generated ( Table 3). The CAP definition patterns frequently used were pattern 5 (infiltrate on chest X-ray and ≥ 1 or ≥ 2 criteria among fever, respiratory symptom and biological inflammatory syndrome, N = 19) and pattern 3 (infiltrate on chest X-ray and ≥ 1 respiratory symptom, N = 11). Definition patterns 6 and 7 had exclusively clinical diagnostic criteria. There was no correlation between the characteristics of the trials (investigator, location, number of centers, design and objectives) and one specific CAP definition pattern. Among the 21 non-inferiority trials, the combinations of diagnostic criteria for CAP belonged to CAP definition patterns 2 (N = 3), 3 (N = 9), 4 (N = 5), 5 (N = 4) and 8 (N = 1).

CAP definition patterns' diagnostic performance
The performances of the 8 CAP definition patterns applied to the ESCAPED database are presented in Fig. 2 and Table 4. For example, considering CAP definition pattern 1, its application to the 319 ESCAPED patients would have led to the inclusion of 61 of the 319 patients. Among these 61 patients, 51 had definite CAP according to the adjudication committee and were therefore "true positives", while the 10 other patients included by CAP a PSI Pneumonia severity index definition pattern 1 were actually "false positives". CAP definition pattern 1 failed to include 112 patients with CAP ("false-negatives"). Finally, the 146 other ESCAPED patients were correctly identified by CAP definition pattern 1 as patients without CAP ("true negatives"). This allowed the calculation of a sensitivity of 31.3%, a specificity of 93.6%, a PPV of 83.6%, a NPV of 55.6%, a positive likelihood ratio of 4.88 and a negative likelihood ratio of 0.73 for the diagnosis of CAP by CAP definition pattern 1 (Fig. 2). The CAP definition patterns 6 and 7, exclusively based on clinical criteria, were very specific (97.4 and 92.3%) but had low sensitivities (9.8 and 23.3%, respectively). Among the 5 other CAP definition patterns including chest X-ray in diagnostic criteria, patterns 3, 4, and 5 maximized sensitivity (from 67.5 to 73%) whereas patterns 1 and 2 maximized specificity (from 83.3 to 93.6%). The most sensitive CAP definition patterns (patterns 3, 4 and 5) were the most used in RCTs (Table 3).
Overall, none of the CAP definition patterns obtained both sensitivity and specificity superior to 65%. Among the six CAP definition patterns including chest X-ray, the number of "false-positive" among the included population varied by a factor of 6 (from 10 for definition pattern 1 to 68 for definition pattern 3).

Discussion
In this study, we report the considerable heterogeneity of CAP inclusion criteria in RCTs and, by applying these criteria to a reference population, explored through systematic CT-scan, we underline the potential risk of inclusion of patients without CAP, in RCTs focused on CAP. We show that the heterogeneity of CAP inclusion criteria applies to the number of criteria, their type (pulmonary or general symptoms, biological criteria, radiological abnormalities), their optional or mandatory nature, their multiple combinations, and was not explained by the characteristics of the trials, their objectives, their methodology or their sponsor.
When applied to a unique reference population of patients suspected to have CAP visiting the emergency department, this variety of CAP definition patterns resulted in variations of the number of patients identified as having CAP and their diagnostic authenticity. Furthermore it affected the proportion of patients with or without definite CAP who would be included in RCTs using these inclusion criteria (variation by a factor of 6 of the number of false positive patients). This variability in the included population has implications in terms of the trials' internal and external validities. Indeed, the evaluation of therapy in individuals who do not have the disease negatively impacts the internal validity of both non-inferiority and superiority trials. Considering noninferiority trials (the majority of those analysed), factors that result in smaller differences between study groups will lead to the false conclusion that the new treatment is not inferior to the gold standard [11][12][13]. This is particularly true for CAP definition patterns with a high rate of false positive CAP. Considering superiority trials, including patients without CAP would result in a loss of statistical power to demonstrate the superiority of a potentially effective therapy.
This heterogeneity in CAP definitions also calls into question the external validity (application of the results) and furthermore prevents future comparisons of these different trial results which, even if they share the same objective and the same declared recruited individuals (CAP patients), will draw conclusions from populations with non-comparable characteristics. This may very well also explain why conclusions of different CAP RCTs differ so much in the literature [13], and limited the conclusions based on the metaanalysis of such studies.
Our study has some limitations. First, we did not test the performance of each of the 42 combinations of diagnostic criteria in the database. We grouped them into eight CAP definition patterns for analysis, which might have resulted in an under-estimation of the heterogeneity of the diagnostic performance of the diverse CAP definitions. Second, the choice of our reference population may have biased the analysis. However, the choice of a reference population to evaluate the performance of diagnostic criteria inevitably results in a selection bias. We believe that the ESCAPED population is the more acceptable reference population, as data collected in the database included all the diagnostic criteria of the eight CAP definition patterns, and the adjudication committee based its judgment on all available data including a thoracic CT-scan and a 28-day follow-up. In this population, 27% of the CAP diagnoses established based on the existence of an infiltrate on chest X-ray were finally excluded by thoracic CT-scan evaluation [4]. Third, we considered in our analysis that "confirmed" and the "probable" CAP patients based on the adjudication committee classification had definite CAP. However, considering only confirmed CAP patients as definite CAP gave similar results (data not shown). Finally, we based our definition of true CAP considering CT-scan as the adequate gold standard for the diagnosis of CAP. It remains to date the most useful non-invasive technique to establish and/or exclude pneumonia diagnosis, [6]. The question arises whether our results allow the identification of the most accurate combination of inclusion criteria for RCTs. The CAP definition patterns 1, 6 and 7, provided excellent specificity but their low sensitivity prevents their common use in clinical research. The three most sensitive CAP definition patternsdisplaying quite similar performances -are already the most used in RCTs, attesting to the pragmatism of investigators. The issue is to improve their specificity. We suggest that the perprotocol analysis of any CAP trial should be performed on the sub-population of patients whose CAP diagnosis is established a posteriori by an adjudication committee, taking into account all available data, including microbiological results and follow-up, and if possible a thoracic CT-scan.
This "two-step" strategy would have the merit of being appropriate in every situation: when high sensitivity of the diagnosis is the priority in the context of urgent CAP care, as delay in antibiotic treatment is associated with poorer prognosis, but also in therapeutic trials, where more stringent inclusion criteria are needed. This strategy would increase the validity of both epidemiological studies and randomized clinical trial results, and make possible the comparison of the results.

Conclusions
The inclusion criteria for CAP in RCTs are highly heterogeneous. Their diagnostic performances vary considerably and some definitions might lead to the inclusion of a proportion of patients without CAP or to missing patients with a CAP diagnosis. This heterogeneity may impact the interpretation of the results of randomized trials, which remain the basis of evidence-based guidelines. To "plagiarize" Carl Nathan [14], it makes no sense to use twenty-first century technology to develop drugs targeted at specific infections whose evaluation is based on RCTs which use nineteenth-century CAP diagnostic criteria. We suggest a "two-step strategy", associating a sensitive combination of inclusion criteria identified in our study, with a systematic per-protocol analysis on the sub-population of patients whose CAP diagnosis is established a posteriori by an adjudication committee, including if possible a thoracic CT-scan.