Study population and field work
Baseline SARS-CoV-2 antibody and questionnaire study
We described the baseline study in detail in [4]. In short, a random sample of the Munich population living in private households was drawn by random walk method. All household members older than 13 years were invited to provide a serum sample and to answer an online questionnaire. Serum samples were analysed for SARS-CoV-2 antibodies using the Elecsys® Anti-SARS-CoV-2 (Roche) test [5]. Field work for the baseline study took place between April 5th and June 12th, 2020.
Questionnaire follow-up
An online questionnaire covering risk behaviour, health related items, and psychosocial aspects (hereafter “behaviour questionnaire”) was offered from June 4th to October 31st, 2020 to all 5240 participants who did not withdraw from the study. In parallel, an online-questionnaire on leisure time behaviour was available (hereafter “leisure time questionnaire”). We split the questionnaire into two, because long questionnaires are less likely to be completed [6]. Participants recruited in April (May to June) 2020 received an invitation via e-mail on June 4th (June 25th) with subsequent reminders and telephone follow-ups. In total, 3400 participants completed the behaviour questionnaire and 1390 participants the leisure time questionnaire.
1st SARS-CoV-2 antibody follow-up
On November 2nd 2020, we started the 1st antibody follow-up by sending out boxes with a self-sampling kit to take a capillary blood samples (dry blood spot; DBS) to the 5292 participants (2978 households) of the baseline study. Between baseline and follow-up, 77 participants withdrew from the study and were thus not contacted for the follow-up. Instructions for self-sampling were provided, including a video tutorial (https://www.youtube.com/watch?v=vpZUzuQV10E&feature=emb_title). Samples were collected using a barcode-labelled neonatal screening filter card (Euroimmun ZV 9701-0101) with circles indicating where the blood should be collected. Afterwards, participants should dry the filter card at least 12 h at room temperature, pack them in the sealable plastic pouch, place the plastic pouch into the prepaid envelope, and ship the envelope by mail to the laboratory. In case of handling difficulties, our telephone and e-mail hotline were available for any questions.
From November 2nd to January 31st, 2021, we received 4444 DBS samples from 2571 households (individual response 84%, household response 86%). Roughly half of the DBS samples (2372 of 4433; 54%) arrived at our laboratory within 1 week of mailing (November 2nd to November 8th). By week 2 (November 9th to November 15th), more than three quarters were received (3369 of 4433; 76%). Most of the remaining samples were turned in between week 3 (N = 372 from November 16th to November 22th) and week 4 (N = 343; November 23rd to November 29th). Few samples were received in December 2020 and January 2021 (N = 326; 7%). Participants not being able to collect a DBS on their own (N = 29) and those with intermediate results (N = 34, s. laboratory methods) were offered a full-blood test at our centre. For the latter group, this served to clarify the DBS result. However, 11 of the 34 participants with intermediate results in the DBS did not show up at our centre and thus had to be excluded from analyses, leaving 4433 subjects with baseline questionnaire, baseline serology and follow-up DBS data for the main analyses (Fig. 1).
Questionnaire data
The following items were considered for the analyses presented in this paper:
Baseline individual questionnaire:
-
Socio-demographics: age, sex (male, female), schooling (< 12 years, ≥ 12 years, in school), current job (employed, self-employed, not working (unemployed, retired, parental leave, sabbatical, students), others (voluntary social year, military service, part-time jobber, reduced working hours))
-
Country of birth: Germany, others
-
Smoking: current, ex, never smokers
-
Chronic conditions: diabetes, cardiovascular diseases, autoimmune diseases, respiratory diseases (yes vs no)
-
General health: “In general, how would you rate your health” assessed on a five point Likert scale from poor to excellent. As very few participants reported “poor”, the poor and fair category were combined.
Baseline household questionnaire:
-
Household size: 1, 2, 3–4, > 5 inhabitants
-
Household income: ≤ 2500 €, 2500– ≤ 4000 €, 4000– ≤ 6000 €, > 6000 €
-
Living area per inhabitant: ≤ 30 sqm, 30– ≤ 40 sqm, 40– ≤ 55 sqm, > 55 sqm
-
Household type: single, couple, family, others (shared apartments by e.g., students, subleasing, and assisted accommodation)
-
Housing type: building with 1–2 apartments, 3–4 apartments, ≥ 5 apartments
Follow-up questionnaire:
-
Self-estimated health-related risk taking behaviour (10-level Likert scale from “not at all risk tolerant” to “very risk tolerant”): Dichotomised into not high (≤ 5, Quartile 3) and high self-estimated health-related risk taking behaviour (> 5)
-
Personal contacts: Five questions on places of personal contacts outside the own household during the two weeks before answering the questionnaire (meeting people, grocery shopping, shopping, use of public transport, work outside home), each assessed on a 5-level Likert scale: not at all (= 1); once per week (= 2); 2–4 times per week (= 3); 5 times per week (= 4); more often (= 5). Places of personal contacts were multiplied by frequency of contacts (0 contacts (= 0), 1 contact (= 1), 2–4 contacts (= 2) and 5 + contacts (= 3)) and summed up, resulting in a score ranging from 0 to 25. The score was dichotomised into lower number of personal contacts (≤ 8, Median) and higher number of personal contacts (> 8). The score was dichotomised into non-high leisure time activities (≤ 11, Quartile 3) and high leisure time activities (> 11).
-
Number and intensity of leisure time activities before the pandemic (in February 2020): For that time, 16 activities assessed on a 5-level Likert scale from “never” (= 0) to “very often” (= 4): visit family member; visit friends; going out with friends; attend a party, festival, bar, pub or disco; go to the cinema; attend a theatre, opera or ballet performance; work out in a gym; visit a swimming pool; visit a sauna; skiing; train for a team sport or take part in sporting competitions; watch a sports game or event live outdoors; watch a sports game or event live indoors; worship attendance; play an instrument in an orchestra; sing in a choir. Activities were multiplied by the Likert scores and summed up resulting in a score from 0 to 64.
-
Number and intensity of leisure time activities two weeks prior to the follow-up questionnaire: The score for leisure time activities at follow-up was built the same way as the score for leisure time activities before the pandemic. However, the number of leisure time activities was only seven at that time as many activities were not possible due to the restrictions related to the pandemic: visit family member; visit friends; going out with friends; visit a swimming pool; worship attendance; play an instrument in an orchestra; sing in a choir. Therefore, the resulting score only ranged from 0 to 28. The score was dichotomised into non-high leisure time activities at follow-up (≤ 5, Quartile 3) and high leisure time activities (> 5).
Laboratory method and cross-validation with blood samples
Filter paper cards were further processed if at least two of the five circles on the card were completely soaked with blood. Valid samples were stored at 4 °C until analysis. Before analysis, filter paper cards were equilibrated to room temperature and three blood-soaked smaller circles (diameter 3.2 mm) of each filter paper card were automatically punched into a 96-wells plate (Panthera-Puncher™ 9, PerkinElmer). After elution, samples were transferred to a Cobas e801 module (Roche) compatible sample micro cup (Roche, 05085713001) for analysis using the Elecsys® Anti-SARS-CoV-2 assay (Roche). Based on our validation study, DBS samples were considered positive if SARS-CoV-2 antibody levels were ≥ 0.12. Samples with SARS-CoV-2 antibody levels in the range between 0.09 and 0.12 were considered intermediate, and subsequently confirmed by plasma samples (s. Study population and field work). All other samples were considered negative. Compared to full blood samples, sensitivity of the DBS method was 99.2% and specificity 98.7%. Details of the laboratory methods are described in [7].
Statistical analyses
All statistical analyses were performed using the statistical software R (version 4.0.3, R Development Core Team, 2020).
The SARS-CoV-2 sero-prevalence was estimated primarily based on the DBS test results of the study participants applying the classification as described above (Laboratory methods). If the DBS test yielded an intermediate result, we considered the result of the full blood sampling. As described in [5], an optimised cut-off of 0.4218 for the full blood sampling was used to predict SARS-CoV-2 sero-positivity with an estimated specificity and sensitivity of 99.7% and 88.6%, respectively (with regard to PCR test results considered as ground truth). We used these estimates to adjust the prevalence for the imperfect test performance [8]. The specificity and sensitivity of DBS with regard to full blood samples being very high, additional adjustment was omitted (Additional file 1: Appendix Text and Table S1).
The prevalence (adjusted or unadjusted for the specificity and the sensitivity of the test) was calculated in two different ways: including the information from the sampling design of the cohort [3] via the use of a weighting scheme, or without it. To account for the sampling design, the sampling weights computed at baseline (inverse of the probability of each individual to be included in the sample) were used for the follow-up analysis. These sampling weights were corrected for the attrition observed at follow-up by modelling the underlying non-response mechanism and estimating probabilities of response for each unit. Ten response homogeneity groups (where we assumed the non-response to be completely at random, [9]) were created using the deciles of the estimated probabilities of response. These weights adjusted for the non-response were calibrated [10] on updated information from the Munich population (at 31.12.2020) in order to mirror the age, sex, country of birth, presence of children in the household and single member household structures. Moreover, to correct the sample for the loss of positive cases at follow-up, the sampling weights were calibrated on the estimated number of positive cases at baseline. Weighted prevalence estimates were calculated using these calibrated weights, and the associated 95% confidence intervals were computed based on variance estimators based on linearization [10] and residual [10, 11] techniques. These variance estimates were computed in order to account for every step in the selection process of the units, i.e., V = V1 + V2 with V1 the variance due to the sampling design and V2 the one due to the non-response [12]. For unweighted prevalence estimates, confidence intervals were determined by using a nonparametric cluster bootstrap procedure that accounts for household clustering [13]. To that end, 5000 bootstrap datasets were generated each by sampling nh households with replacement from the original sample of nh households. The sero-prevalence was estimated in each bootstrap sample and the 2.5 and 97.5 percentiles of the resulting 5000 estimates defined the 95% confidence intervals.
To analyse spatial clustering, we considered the mean within-cluster variance of the binary test results, with cluster variables being households, buildings, and geospatial clusters of different sizes. We performed a non-parametric approximate permutation test with 10,000 random permutations of cluster assignments. To account for household clustering, only full households were permuted when considering buildings and geospatial clusters [14]. In addition to this, we performed borough level sero-prevalence mapping using Conditional Auto Regressive Models which account for the spatial autocorrelation among neighbouring boroughs by using random effects. This allowed us to investigate if sero-prevalence was associated with the population density or not, as well as obtaining Borough/District level estimates within the city of Munich (Additional file 1: Appendix 2) [15,16,17,18].
We used generalised linear mixed models (GLMMs using the logit link function) to analyse the association between potential risk factors and SARS-CoV-2 sero-positivity at 1st follow-up, with a random effect for households to account for within household clustering of the data. Odds Ratio estimates and the corresponding confidence intervals were obtained applying a Bayesian framework with uniform priors on the regression estimates using the brms (Bayesian Regression Models using 'Stan') package in R [19, 20]. To account for missing data in covariates, we used the Joint Analysis and Imputation of Incomplete Data Framework (JointAI) in R for sensitivity analyses [21, 22]. In these sensitivity analyses, broad normal priors with mean zero and standard deviation 100 were used. The regression estimates were adjusted for the SARS-CoV-2 serology results at baseline, the time elapsed since baseline visit, age, and sex of the individual. Essentially, this adjustment for baseline positivity allowed us to obtain risk factors associated with newly incident cases within our cohort over and above the baseline positives.
To explore the importance of behavioural factors and leisure-time activities for the incidence of infection between baseline and follow-up, we used data of the 1st questionnaire follow-up combined with the DBS results. For these analyses, we included information of 2768 participants who responded to the behaviour questionnaire and had serology results; for the leisure time activities, we had questionnaire information for 1263 persons with serological results. Due to the large proportion of missing questionnaire data, we restricted these analyses to complete data and aggregated at the levels of the outcome variables. We analysed the incidence of new SARS-CoV-2 infections (as binomial outcome for proportions) between baseline and follow-up stratified for risk behaviour, leisure time activities, sex and age, using the count of new positives among the observed. Similar models were also applied to evaluate the association of the population densities at the constituency level and the trend in sero-prevalence estimates using aggregated data.