Study population
In the present study, we used data from an observational prospective population-based cohort study. The rationale and methodology of The Maastricht Study have been described by Schram et al. (2014) [41]. All individuals living in the southern part of the Netherlands and aged between 40 and 75 years were eligible for participation. Recruitment strategies have been described previously [41]. We included cross-sectional data from the first 3451 participants (baseline survey between November 2010 and September 2013) [41]. Participants adhere to a protocol that covers 4 half day visits to The Maastricht Study research center [41]. The present study includes data from assessments and questionnaires that are given within the first study site visit. Of the 3451 participants, 3004 individuals provided data on social network and infections. The participants without social network and infection data (n = 447, 12.9%) did not differ from those with these data with respect to sex, educational level, smoking status, alcohol use, diabetes status or body mass index (BMI). However, participants who did not provide social network and infection data were slightly younger than those who did (mean age 59 versus 60 years, (p < 0.001)).
Measurements
Social network
Multiple previous studies applied various methods and techniques to collect empirical data on social networks and contact patterns, such as reviewed by Read et al. [17].
In the present study, social networks were identified by a name generator, one of the best known and most widely used instruments for examining ego-centered network data [42]. The name generator/interpreter is used to map the ego-centered social network and to collect information about the alters of an ego-centered social network, resulting in a detailed description of a participant’s social network. An ego-centered network is defined as a network centered on a specific individual, called the ego [43]. Each person who has a relationship with an ego was defined as an “alter” [44]. The social network measured within this study mainly focused on close-proximity interactions [14].
A detailed description of the name generator questionnaire can be found elsewhere [45]. In brief, the name generator first requires a respondent to identify actual persons (alters) in response to seven questions on different types of contacts (e.g. persons who advised them on problems or persons they visited for social purposes or that they could go out with sometimes). For all seven types of contacts, they were asked to indicate their frequency of contact with this person over the last six months (daily or weekly, monthly, quarterly, and half-yearly). In total, participants could name a maximum number of 40 alters. Next, several additional questions about all alters named were asked (sex, age, type of relationship, geographical proximity).
Moreover, participants were asked to rate the statements “most of my friends know each other” and “my best friends know my family” on a five-point Likert scale ranging from strongly agree to strongly disagree. Finally, participants had to indicate whether they were a member of a club (yes/no), and, if so, to identify the club(s) concerned (sports club, volunteer organization, religious group, self-support group, discussion group, Internet club, or another organization) and how often they frequented this club (daily/weekly, monthly, occasionally).
Self-reported infections
In a structured questionnaire, participants were asked whether they had suffered from sudden symptoms such as a cough, runny nose, sore throat, fever, vomitus with fever, or pain when urinating, in the previous two months. They were also asked whether they had suffered from sudden onset of influenza, pneumonia, urinary tract infection, middle ear infection, diarrhea, or skin infection in the previous two months. All of these questions were yes/no questions.
General measurements
Self-administered questionnaires were used to assess educational level (low (no education, primary education, and lower vocational education)/medium (intermediate vocational education, higher secondary education, and vocational education)/high (higher professional education, university)), employment status (employed/retired or not employed/not known), smoking status (never/former/current) and alcohol consumption (non-consumers/low consumers (≤7 glasses per week for women, ≤ 14 glasses per week for men)/high consumers (> 7 glasses per week for women, > 14 glasses per week for men)). To determine type 2 diabetes, all participants (except those who used insulin) underwent a standardized 7-point 75 g. OGTT after an overnight fast. Height, weight and BMI were assessed as described previously [41], and defined according to the WHO classification (normal (BMI < 25), overweight (BMI 25- < 30), and obese (BMI ≥30)).
Exposure variables: Social network parameters
First, in the literature we identified several social network parameters that had previously been examined in relation to infections. Next, the social network parameters listed below were computed and used in the current study. The majority of social network parameters used in the current study focused on close-proximity interactions as previous studies had shown their importance in infectious disease transmission [14, 18].
Network size (degree)
Previous studies identified social network size as determinant for several health outcomes and it is also widely used in mathematical disease transmission models [9, 10, 13, 16, 19, 25, 28, 33, 42, 46]. Therefore, the degree of the social network was defined as the total number of alters mentioned in the questionnaire and was computed as the size of the ego network (network size).
Contact frequency
In line with several studies on mathematical modelling of the spread of infectious disease, we also investigated contact frequency [19, 21, 22, 25, 33]. First, we used highest contact frequency (e.g. daily contact) for every alter as an indicator of the actual contact frequency. For example, if participants reported alter 1 as a person they visited for social purposes, with a frequency of “daily or weekly” and also named the same alter as a person who provided practical help if they were sick, with a frequency of “quarterly”, we considered “daily or weekly” as the actual frequency of contact between the ego and the alter. Second, we recoded the answer categories of the questionnaire in an estimated number of contacts per half year. For example, “half-yearly” was assumed to comprise one contact, “quarterly” two contacts, “monthly” 6 contacts and “daily or weekly” 48 contacts [21]. Third, we computed the sum of all contacts per half year as the total contact frequency. In addition, we computed the percentages of alters that the ego had daily/weekly, monthly, quarterly and half-yearly contact with, for example as the number of daily/weekly contacts divided by network size.
Geographical proximity
Previous studies included measures on home contacts and distance from home [21, 22, 28, 33]. In the current study, we calculated geographical proximity as the percentage of all alters that were household members, lived within walking distance, lived less than half an hour away by car, lived more than half an hour away by car, and lived further away (e.g. in another country). For example, we calculated the percentage of household members as the number of alters living in the same household divided by network size.
Network heterogeneity
In accordance with another study among social networks in the Netherlands, we also computed heterogeneity of age and sex [42]. To assess sex heterogeneity within the ego’s network, we computed the Index of Qualitative Variation (IQV) by Mueller and Schuessler (1961) [47]. This index indicates the probability that two randomly chosen network alters belong to the same category. The statistical formula for the derivation of the IQV can be found in the Additional file 1. In brief, the IQV is defined as the ratio of observed differences divided by maximum possible differences, where “0” represents a fully homogeneous and “1” a fully heterogeneous network [47]. Observed differences were calculated through multiplication of the total number of men in the ego’s network by the total number of women in the ego’s network. We calculated maximum differences as (network size/ 2)2 [47]. The IQV was computed as observed differences/maximum possible differences. We defined age heterogeneity of network alters as the standard deviation of the mean age of all alters of the ego [42].
Mixing
According to studies on mathematical infectious disease modelling, we calculated mixing parameters for age mixing patterns (whether the ego had contact with younger, same age or older alters) and sex mixing patterns (whether the ego had contact with alters of the same sex or the opposite sex) [22, 28, 33]. To identify age mixing, we calculated the difference between the ego’s age and the alter’s age for every alter named. Next, we computed the percentages of younger (> 15 years and 5 to 15 years younger), same age (±5 years) and older (> 15 years and 5 to 15 years older) alters for each participant. To indicate sex mixing, we calculated the percentage of same-sex alters. For example, for a female participant the number of her female alters was divided by her network size to obtain the percentage of same-sex alters.
Type of relationship
The questionnaire also assessed the type of relationship between the ego and the alter. To the best of our knowledge, this is the first study that examines network composition in terms of the type of relationship. To that end, we computed the percentage of alters that were family members, friends, colleagues and acquaintances. For example, we calculated the percentage of family members within the network as the number of family members divided by network size. Whether the ego had a partner was derived from the social network questionnaire and computed as having/not having a partner. A partner was defined as an intimate relationship with another person.
Density
We assessed network density in two questions [42, 48], categorizing density scores separately for density of the ego’s friends and density of the ego’s friends and family. Density was defined as the extent to which alters in the network know each other. Density between friends was computed from the statement “most of my friends know each other” (five-point Likert scale ranging from strongly agree to strongly disagree) and density between friends and family was computed from the statement “my best friends know my family”. We used tertiles to compute three equal groups of low density, medium density and high density.
Superficial contacts
We included a proxy for more superficial contacts than close-proximity interactions as transmission of infections may also occur via contact with contaminated surfaces or exposure that does not involve conversation or touch [49]. We therefore constructed a variable representing the total number of club memberships (and the number of clubs the ego frequented on a daily or weekly, monthly or occasional basis) as a proxy for superficial contacts.
Close proximity interactions
While all of the types of interactions in the name generator suggest close and direct contact, the questions do not explicitly include information on whether an interaction is physical (e.g. kiss or handshake), face-to-face or by phone/internet. Some interactions such as help with jobs around the house or persons they visited for social purposes require close proximity interactions, whereas other interactions such as advice on problems or provision of emotional support may have occurred by telephone/internet.
To assess the proportion of close proximity interactions, we additionally computed the network size and total contact frequency from those type of interactions with alters that are by definition in close proximity; persons who could offer them practical help if they were sick, persons who helped them with small and larger jobs around the house, persons who were also important for them because of mutual activities, and/or household contacts. The additionally computed social network size of close proximity interactions represented 86% of the total social network size (8,5 alters in a network of 10 alters), and the total contact frequency of close proximity interactions represented 87% of the total contact frequency (202 contacts per half year in a total number of 231 contacts per half year).
In all analyses, the total network size and total contact frequency per half year were used.
Outcome variables: Self-reported symptomatic infections over the past two months
The symptoms “runny nose” and “sore throat” were pooled as indicators of URI. Influenza, pneumonia and fever were pooled as indicators of LRI. Pain when urinating and urinary tract infection were pooled as UTI. Vomitus with fever and diarrhea were pooled as GI. We excluded cough from the analysis because it is strongly related to smoking and asthma [50], and not exclusively a specific indicator for infection.
The observed prevalences for each month of the year (Fig. 1) display the expected seasonality of the diseases.
Statistical analyses
We performed descriptive analyses to examine the characteristics of the participants in terms of baseline characteristics, prevalence of self-reported infections, and network parameters.
First, we conducted bivariate correlation analyses to rule out multicollinearity between the network variables. With all correlation cut-off values below 0.7, none of the variables were considered collinear.
Second, we conducted univariable logistic regression analyses to assess the association between the exposure variables, i.e. network parameters, and the outcomes of URI, LRI, GI, and UTI. All network parameters were continuous variables, except for density. For every network parameter, odds ratios (ORs) and 95% confidence intervals (95%CI) were calculated.
Network size as (unadjusted) determinant for the four infections was visually presented using polynomials (cubic function).
Third, we built two multivariable models to determine the most important detrimental and beneficial network parameters. We forced network size and total contact frequency into the models as these variables are considered essential for the assessment of detrimental and beneficial social network effects, and have been shown to be related to the transmission of infections as well as to decrease susceptibility to infections [9, 10, 13, 19, 21, 22, 28, 29, 33, 42]. In the detrimental exposure model, we further included all variables that were positively associated with URI (odds ratio > 1), regardless of their statistical significance. In the beneficial exposure model, we further included all variables that were negatively related with URI (odds ratio < 1), again regardless of their statistical significance. Next to the social network size and total contact frequency, we used several social network parameters in percentages, to be able to assess the effect of the composition of the social network independent of the social network size and total contact frequency. For those social network parameters that were computed as percentages within the network, the associations were presented in steps of 10%. Based on an average network size of 10 network members, a change in one network member corresponds to 10%. For the detrimental and beneficial models, we used the stepwise backward method (p < 0.1) to obtain the final model, including possible confounders, network size, and total contact frequency. These analyses were repeated for LRI, GI and UTI. We used the variance inflation factor (VIF) to measure collinearity in all regression models. Values for VIF and tolerance did not indicate multicollinearity problems with cut-off values of VIF < 10 and tolerance (1/VIF) > 0.1.
We adjusted all associations for possible confounders, i.e. diabetes status (type 2 diabetes oversampled by design), age, sex, BMI, smoking status, alcohol consumption, educational level, and employment status. We also adjusted all associations for the season in which the measurement took place to account for the likelihood of encountering an infected source. In the multivariable models, associations with p < 0.05 were considered statistically significant.
In addition, we tested statistical interaction (effect modification) of the network parameters with sex and age to check whether the associations between network parameters and outcome differed by sex and age. However, none of the interactions of the network parameters with sex and age were statistically significant (p > 0.1).
We performed sensitivity analyses to verify the model building process; we replicated multivariable logistic regression analyses by using the complete model instead of backward elimination, and used degrees instead of percentages. The findings were in line with the results presented.
All analyses were conducted using IBM SPSS Statistics version 21.0 (IBM Corp. Armonk, NY, USA).