Data source
Data were accessed from the Combined Intelligence for Population Health Action (CIPHA; www.cipha.nhs.uk) resource. CIPHA is a population health management data resource set up to support responses to COVID-19. It constitutes linked electronic health records from routinely collected administrative data. Here, we used the population spine for CIPHA (all people registered with a GP and their primary care records), linked to NHS vaccination records and all registered SARS-CoV-2 tests.
Study population
CIPHA contains linked pseudonymised electronic health care records for 2,864,997 people. We included people (n = 2,767,027) with a complete address who were resident during the study period in the integrated care region the CIPHA resource was set up to serve (Cheshire and Merseyside, England). Participants with missing data (n = 101) were excluded from analyses (other than missing data for ethnicity which we adjust for). For each period of analysis, we only include people who were alive up to the end of the period to minimise issues with immortal time bias.
Study design
We selected three time periods to analyse:
-
1.
Delta—3rd June to 1st September 2021 We defined the start of the period as when Public Health England (now UK Health Security Agency) stated that the Delta variant was 99% of all infections [19].
-
2.
Delta—1st September to 27th November 2021 We selected this period to cover the wave of infections associated with the new school year (starting 1st September 2021) up to where the first case of Omicron was detected in England. The latter period was selected to focus our analyses on cases relating primarily to the Delta variant of SARS-CoV-2 to avoid any differences in risk of further infection or vaccine escape the Omicron variant may have.
-
3.
Omicron—13th December 2021 to 1st March 2022 We defined the start of this period as when sequencing data suggested that most positive tests were for Omicron. The period is then up to the end of available data at the time of analysis.
Outcome variable
The primary outcome variable was time to SARS-CoV-2 infection (registered positive test) during each period. Time was defined as when the test was taken rather than when it was processed. Positive cases are compiled from data feeds supplied by the UK Health Security Agency, who share all Pillar 1 (tests in care settings) and Pillar 2 (tests in the community) positive tests which are linked within CIPHA. Positive cases are identified using both lateral flow and polymerase chain reaction (PCR) tests.
Explanatory variables
We focused on three key explanatory variables: COVID-19 vaccination status, previous SARS-CoV-2 infection and neighbourhood socioeconomic deprivation.
Vaccination status was defined as the number of doses (of any vaccine type combination e.g. BNT162b2 (Pfizer-BioNTech) and the ChAdOx1 nCoV-19 (Oxford-AstraZeneca)) an individual had received (0–3). We identified the number of first doses received two weeks before the start of each period, and one week prior for two or three doses, which we define as the time to receive immune protection (following other research [3, 8]). The measure was then updated (i.e., time-varying) over time to account for people who received an additional vaccination during each study period. This was achieved using established methods through updating the time interval based on vaccination status, holding other covariates constant [20].
Previous SARS-CoV-2 infection (binary) was defined as whether an individual had a registered positive test from the start of the pandemic up to two weeks before the start of each period [21]. The measure was held constant and not time varying. We defined this two-week period as the time to develop immune protection. Infections were selected based on the first positive test, and subsequent positive tests occurring more than 90 days apart (which we defined as a further/subsequent infections). This definition follows established research elsewhere [8, 21]. We evaluated if this definition affected our results by introducing immortal time bias (i.e., some individuals could not test positive for parts of the study periods if they tested positive closer to the start period) through only including individuals who had a previous positive test at least 90 days before the start of the period as a sensitivity analysis.
Neighbourhood socioeconomic deprivation was measured through matching individual’s residence to the 2019 Index of Multiple Deprivation (IMD) [22]. The IMD is a multi-dimensional index of neighbourhood deprivation, based on seven weighted domains including income, employment, education and health. The IMD score is measured for Lower Super Output Areas (LSOAs) which are small zones representing neighbourhoods (~ 1500 people). Larger scores represent higher levels of socioeconomic deprivation. We also reported analyses by IMD decile to aid interpretation.
Control variables
We accounted for demographic factors sex (male or female) and age. Age was included as a categorical variable to account for non-linear dynamics and produced a better fitting model than a continuous measure. Age is an important factor for different risks in exposure to SARS-CoV-2, as well as to reflect that the vaccination programme was rolled out by age group. Ethnicity was included to account for inequalities in both exposure to SARS-CoV-2 and vaccination uptake. Broad ethnic groups were used: White, Asian, Black, Mixed and Other. We also include ‘prefer not to say / missing’ as a category in our models, since they accounted for a large proportion of records and this can account for any issues with this group being different in causal behaviours. Health status was included to account for differences in behaviours, where people with long-term health conditions may ‘shield’ or minimise social contacts. We define health status (comorbidity) as if individuals had a registered Expanded Diagnosis Clusters codes (yes or no). Codes represent diseases, symptoms or conditions that are treated in ambulatory and inpatient hospital settings. Finally, we also adjusted for differences in testing dynamics by accounting for whether an individual had registered a negative test in the previous month.
Statistical analyses
We found evidence of inequalities in registered test behaviours (Additional file 1: Table A). To minimise this potential bias in our regression analyses, we focused our analyses on two cohorts. First, we selected only individuals who reported a negative test in the month prior to each time period as a proxy for being engaged in testing. This is similar to a ‘test-negative’ study design which have been used for studying vaccine effectiveness [23]. Second, we analysed individuals who had received an influenza vaccine within a year of each time period as a proxy of being engaged in healthcare (i.e., likely to register a test even if unvaccinated and not disengaged with health care) [24]. For the Omicron period, we extended this time frame to 1st September 2020 to fully capture the previous year’s influenza vaccination campaign. While our main models use all individuals, in a sensitivity analysis we restricted this population to just people aged 65 years and over as they are the focus of the UK influenza vaccination programme. Matching methods were also investigated for balancing populations across our exposure variables, but did not significantly alter the models and are not discussed here. We also reported analyses for all residents of Cheshire and Merseyside as a sensitivity analysis.
Descriptive statistics and visualisations were produced to summarise our data and identify key trends. Cox regression models were then used to predict the associations between our explanatory and control variables to our outcome variable (time to registered positive test). Hazard ratios and 95% confidence intervals were estimated from these models to summarise associations. Interaction effects for vaccine status and previous infection were tested, but not included in the results since they did not improve the model fit. We also stratified analyses by 10-year age group. This was to capture dynamics between children/adolescents and adults which will each have different modes of transmissions, risks and vaccination access [25]. We tested the proportional hazards assumption of models through estimating Schoenfeld residuals. Most associations met the proportional hazards assumption. Where the assumption was violated, estimates were not extreme and/or resulting plots did not display obvious violations suggesting that findings were potentially exaggerated by our large sample sizes. Alternative model specifications did not produce significantly different findings.
Patient and public involvement
No patients and the public were involved in this piece of research.