Tracking social contact networks with online respondent-driven detection: who recruits whom?

Background Transmission of respiratory pathogens in a population depends on the contact network patterns of individuals. To accurately understand and explain epidemic behaviour information on contact networks is required, but only limited empirical data is available. Online respondent-driven detection can provide relevant epidemiological data on numbers of contact persons and dynamics of contacts between pairs of individuals. We aimed to analyse contact networks with respect to sociodemographic and geographical characteristics, vaccine-induced immunity and self-reported symptoms. Methods In 2014, volunteers from two large participatory surveillance panels in the Netherlands and Belgium were invited for a survey. Participants were asked to record numbers of contacts at different locations and self-reported influenza-like-illness symptoms, and to invite 4 individuals they had met face to face in the preceding 2 weeks. We calculated correlations between linked individuals to investigate mixing patterns. Results In total 1560 individuals completed the survey who reported in total 30591 contact persons; 488 recruiter-recruit pairs were analysed. Recruitment was assortative by age, education, household size, influenza vaccination status and sentiments, indicating that participants tended to recruit contact persons similar to themselves. We also found assortative recruitment by symptoms, reaffirming our objective of sampling contact persons whom a participant may infect or by whom a participant may get infected in case of an outbreak. Recruitment was random by sex and numbers of contact persons. Relationships between pairs were influenced by the spatial distribution of peer recruitment. Conclusions Although complex mechanisms influence online peer recruitment, the observed statistical relationships reflected the observed contact network patterns in the general population relevant for the transmission of respiratory pathogens. This provides useful and innovative input for predictive epidemic models relying on network information. Electronic supplementary material The online version of this article (doi:10.1186/s12879-015-1250-z) contains supplementary material, which is available to authorized users.


Table of Contents
This file contains supporting information for the results presented in the manuscript "Tracking social contact networks with online respondent-driven detection: who recruits whom?". The supportive information is presented in the order as it is discussed in the main manuscript.
In chapter 1 we explained in detail how numbers of contacts and the effects of covariates were analysed. In chapter 2 we investigated the influence of participants' characteristics on the size of a recruitment tree. In chapter 3 we displayed the mixing matrices by age of our sample and of the Dutch POLYMOD separately. Here we also provided the absolute number of self-reported symptoms and a visualisation of the mixing patterns by degree. In chapter 4 we analysed the distance between recruiters and their recruits, and quantified the extent to which a recruiter who lives in a certain region in the Netherlands invited contact persons that live in the same municipality as the recruiter is working and/or studying.

Numbers of contact persons and the effect of covariates
In the questionnaire participants were asked for number of contact persons during one full day ('yesterday'), this number was defined to be a participant's 'degree'. First, we looked at the distribution of degree, stratified by days of week and the locations that were predefined in the questionnaire ( Figure A1). For at home and at other places the distributions of degree were fairly similar. During weekdays participants reported more contact persons at work or university, then during the weekend. There were no large differences in the total degree distributions (see 'at all locations') between weekdays and weekends. We investigated which covariates influence degree using a regression model. Firstly, we investigated which theoretical distribution best fitted the empirical distribution using the R package 'GAMLSS'.
The degree distribution showed strong over-dispersion, with a mean degree of 19.6 per participant (median: 11.0; SD: 35.3). Table A1 displays the parameter estimates and AIC's of the various fitted distributions. Note that the power-law was fitted with the GAMLSS function "PARETO2". Cumulative distributions with a power-law form are sometimes said to follow a Pareto distribution (or Zipf 's law) [1]. Figure A2 displays the various distributions in a reverse cumulative probability distribution plot (Log10 transformed).   Based on the AIC's, the continuous distributions Log-normal and Power-law best fitted the empirical degree distribution [2]. However, these are continuous distributions fitted to a discrete distribution.
Therefore, we chose the first best fitted discrete distribution: the Poisson-inverse Gaussian (PIG).
The PIG distribution, an alternative to negative binomial, has the potential for modelling highly dispersed count data due to the flexibility of Inverse Gaussian distribution [3,4]. We applied the PIG distribution in the regression analysis.
We used a PIG regression model to investigate the effect of the following covariates on degree: age, sex, household size and ILI, and days of the week. The reference categories were the 0 ̶ 39 age group, females, one-person households, no self-reported ILI, and Sunday. Table A2 shows the output of the regression model. IRR stands for incidence rate ratio that are standard provided when conducting a PIG regression analysis.

Descriptive analysis of recruitment trees
We conducted a descriptive analysis to investigate which characteristics of individuals in a recruitment tree influence the total size of a tree. Firstly, we plotted the number of nodes, i.e., participants who completed the questionnaire, stratified by characteristics of seeds ( Figure A3).
The characteristics of seeds did not appear to influence the number of nodes in a recruitment tree. Figure A3-B did show a slight increase in tree size for a larger proportion of trees with a female seed, e.g., of all trees with a node size of 4 more than 75% had a female seed. This effect of female seeds was not shown for trees with a size of 5 or more nodes, which is probably due to the lower number of trees with those sizes.
In Figure A4 we investigated the relationship between tree size and the composition of the entire recruitment tree. Overall, the larger the proportion of women or individuals with a bachelor's degree or higher in a recruitment tree, the larger the tree size was on average (see Figures

Mixing by age and comparison with POLYMOD
We compared the recruiter-recruit matrix stratified by age with the participant-contact matrix by age collected during the Dutch POLYMOD study (see Figure A5) [5,6]. We used the Dutch POLYMOD data that was corrected for digit preference by participants for the age of contact persons, details on this correction can be found in Van de Kassteele, J., et al. [6].
Strong assortative mixing patterns by age were observed in both matrices. However, in our sampled recruiter-recruit matrix the younger age groups (below 20 years of age) were not represented. In the Dutch POLYMOD study these younger age groups were purposely oversampled to be able to analyse their contact patterns, as the hypothesis is that children play a central role in the transmission dynamics of influenza pandemics [6]. Children have frequent contact within their own groups and they have a wide range of contacts, therewith connecting all age groups [7]. Figure A5. Comparison with the Dutch POLYMOD study. We compared the recruiter-recruit matrix stratified by age with the participant-contact matrix by age collected during the Dutch POLYMOD study [6]. In Figure 2D (see main manuscript) we compared the number of contact persons reported at different locations by our participants, with the contacts reported at different locations in the Dutch POLYMOD study [6]. For this comparison the sample was weighted for the size of age groups in the POLYMOD study. The applied weights can be found in Table A3.

Mixing by degree
We plotted the recruiter-recruit matrix by degree (see Figure A6). We observed random recruitment by degree between a recruiter and a recruit, which corresponds to the correlation coefficients for degree displayed in Table 3 in the main manuscript. As the dots clustered in the left corner, we looked more closely at the distribution up till an individual degree of 50. Furthermore, we plotted the distribution on a log10 scale, which also illustrated random mixing.

Distance between recruiter-recruit pairs
We investigated the distance between recruiters and recruits based on the provided 4-digit postal codes. The distribution of the distance between recruiter-recruit pairs was right-skewed ( Figure A7).
Based on the distribution in Figure A7, we categorised for Table 4 (see main manuscript) distance into three groups: same postal code, 1 ̶ 10 km and > 10km. It appeared that for the first group (same postal code), recruiters invited slightly more similar aged recruits compared to the other two distance groups (see Figure A8). This is confirmed by the correlation coefficients for age in Table 4 in the main manuscript.

Figure A7. Histograms of distances between recruiter-recruit pairs in kilometres (km).
The mean distance between seeds (wave 0) and their recruits in wave 1 was higher than the mean distance between recruiter-recruit pairs in consequent waves. Figure A9 displays the distances for different link steps, as seen from the seed. Thus link step 1 is between seeds and their recruits in wave 1. Figure A8. Age distributions of recruiters and recruits stratified by distance. Pink colour: recruiters; blue colour: recruits. Figure A9. Distance between recruiter-recruit for different link steps. Link steps here indicate links steps as seen from the seed. Thus, link step '1' indicates the link between seeds and recruits-inwave-1; link step 2 the link between recruiters-in-wave-1 and recruits-in-wave-2, and so on.

A logistic regression analysis
The Netherlands counts 12 provinces that represent the administrative layers between the national government and the local municipalities (i.e. subdivisions of a province). We categorised the   Figure 4A) and the commuting network ( Figure 4B). Therefore, we investigated for Dutch participants the relationship between the geographical locations where a recruiter works and/or studies, and the location where their recruited contact person lives. We excluded participants living in Belgium. Home location was defined by the provided 4-digit postal code. The work location was defined by the city or village that was provided in the questionnaire.
Our goal was to quantify the extent to which a recruiter who lives in a certain region (four regions defined, see above) invites contact persons that live in the same municipality as the recruiter is working/studying.
We used a mixed effect logistic regression model to estimate the binary outcome: -recruiter did not invite a recruit who lives in the same municipality as the recruiter is working or studying (0) -recruiter invited a recruit who lives in the same municipality as the recruiter is working or studying (1) This outcome variable was created through recoding: -"municipality where recruiter works/studies" ≠ "municipality where his/her recruit lives" [1] -"municipality where recruiter works/studies" = "municipality where his/her recruit lives" [2] The log odds of the binary outcome was modelled as a linear combination of the variables "region of residence recruiter" (four regions) and "recruiter lives and works in same municipality" (binary: yes/no), with the region West-Netherlands and 'recruiter not working in the same municipality as he/she is living' as a reference group. The random intercept was provided by recruiter ID, to correct for differences between recruiters, e.g., in numbers of contact persons invited per recruiter and type of recruited contact persons. -indicated that they lived in the Netherlands -provided a work or study location in the questionnaire. We used the fitted logistic regression model to estimate probabilities of the outcome (2) for the four regions in the Netherlands (i.e., predictions based on not knowing what recruiter ID is being predicted). Confidence intervals (95%) were calculated by both using fixed-effects uncertainty only, as well as by using fixed effects uncertainty + random effect variance. Table A6 shows the output of the mixed effect logistic regression. The variable working and living in same municipality significantly influenced the outcome. Table A7 displays