Definitions and inclusion criteria
Vaccinated persons were those with ≥ 14 days after completion of the recommended primary series of an FDA-authorized COVID-19 vaccine. At the time of study, only Pfizer-BioNTech and Moderna COVID-19 vaccines were FDA-authorized, both of which had a two-dose primary series. Unvaccinated persons were those who did not meet criteria above, and thus included persons who were not vaccinated or were partially vaccinated with the primary series, including those who had received the primary series but were < 14 days following series completion at time of analysis.
A COVID-19 case in a vaccinated person was defined as the detection of SARS-CoV-2 RNA or antigen in a respiratory specimen during the study time period. Analysis included cases who completed their primary vaccine series by February 28 and tested positive for SARS-CoV-2 by March 31, 2021.
COVID-19 case identification among the vaccinated population and data collection
Five states (Alaska, Colorado, Indiana, Kentucky, and Tennessee) were selected because of their similar approach to case identification through active linkage of their COVID-19 case surveillance systems with their immunization information systems, which was relatively rare among states at the time of this analysis. State health department personnel actively identified COVID-19 cases among vaccinated individuals by matching all positive SARS-CoV-2 laboratory test results with state-based immunization information systems using patient name and date of birth. De-identified cases among vaccinated people were reported to CDC’s COVID-19 vaccine breakthrough surveillance database; data on cases used in this analysis were subjected to additional scrutiny for completeness and illness outcome by participating health departments and sent to CDC by June 2021 [16, 17]. Variables included patient demographics, residence location and type, SARS-CoV-2 laboratory test method, COVID-19 vaccination type and dates, hospitalization, and outcome. State-based hospitalization and death databases were reviewed to ascertain severe outcomes. For fatal cases, hospital records and death certificates were reviewed to categorize deaths as COVID-19 related or non-related. Available respiratory specimens that tested positive for SARS-CoV-2 RNA were characterized by viral genomic analysis [18]. Vaccine administration data for vaccinated individuals by week were obtained from immunization information systems of each state.
This activity was reviewed at CDC, determined to be non-research public health investigation not requiring further institutional review board review and was conducted in accordance with applicable federal law and CDC policy.
Data analysis
Data were aggregated across the five participating states and analyzed at CDC using SAS version 9.4 (SAS Institute, Cary, NC). Patient age was categorized into five groups for descriptive analyses: 16–49, 50–64, 65–74, 75–84, and ≥ 85 years. Race and ethnicity data were combined. Where residential status was known, nursing homes and assisted living facilities were categorized as long-term care facilities.
Person-weeks at risk for COVID-19 among the vaccinated population was calculated by multiplying the count of vaccinated persons in each stratum of week of vaccine completion, sex, age group, and vaccine type by the total number of weeks at risk beginning 14 days after receipt of the second vaccine dose through the end of March 2021.
We compared the rate of COVID-19 cases among the vaccinated population in the five states during January–March 2021 to the rate of reported COVID-19 cases among the unvaccinated population ≥ 16 years of age. The number of COVID-19 cases that occurred among the unvaccinated population was not directly available and was thus approximated by subtracting the number of COVID-19 cases among vaccinated persons and vaccinated population counts from the five states from their total COVID-19 case counts during January–March 2021 and the total estimated 2019 state population, respectively. Rates of COVID-19 among vaccinated and unvaccinated populations were directly standardized to the 2010 U.S. population in four age categories: 16–49, 50–64, 65–74, and 75 years and older [19, 20].
To evaluate if COVID-19 among vaccinated persons was more likely to occur among certain populations, we compared the sex, age, and vaccine product received for COVID-19 cases to those of the vaccinated population within each state. For this analysis, we further collapsed age groups into three categories: 16–64, 65–84, and ≥ 85 years. We calculated incidence rates of infection among vaccinated persons according to person-weeks at risk in each stratum and used Poisson models to calculate incidence rate ratios (IRR) and 95% confidence intervals (CI). We used a multivariable Poisson model to generate IRRs adjusted for each variable of interest as well as for potential confounding by state of residence and month of vaccine series completion to account for differences in SARS-CoV-2 transmission across states and over time. Multivariable analyses did not include race, ethnicity, residential housing status or occupation, as incomplete or unavailable data in these categories precluded reliable IRR analysis. Sensitivity analyses with removal of epidemiologically linked cases in the same long-term care facility and those reported as asymptomatic were performed to evaluate their respective impact on overall findings. Infections among vaccinated persons were not removed from the denominator of vaccinated persons eligible for vaccine breakthrough in weeks following occurrence of their infection given their small number compared to the total vaccinated population. Due to sample size, only ratios of infection rates according to vaccination status were compared, not rates of severe outcomes.
Sequence data analysis
SARS-CoV-2 sequence data from the representative sample of specimens routinely submitted to CDC’s national genomic surveillance program from the five states were used to characterize viral strains circulating in those areas during the study period [21, 22]. We used the chi-square test to compare the infections among fully vaccinated persons due to a SARS-CoV-2 variant of concern to the proportions of variants of concern in the genomic surveillance data [14]. A sensitivity analysis was performed that matched each SARS-CoV-2 case with sequence data to one randomly selected case from the same state and month with sequence data reported to the national genomic surveillance program. A second sensitivity analysis was performed that excluded viral sequence data from epidemiologically linked cases.