The study was conducted in four sub-counties of Sheema North Sub-District (southwest Uganda), an area with a total of about 80,000 inhabitants. About half (49%) of the district’s population is < 15 years. The area is primarily rural.
Study design
Between January and March 2014 survey teams undertook interviews of a subset of individuals who were also included in a survey of Streptococcus pneumoniae carriage [15], asking about their social contacts in the 24 h preceding the survey, including the frequency, type and duration of encounters.
Our target sample size was 687, including all 327 individuals aged ≥15 years included in the nasopharyngeal carriage study within which this study was nested [15] and a subsample of 90 children in each of the following age groups: < 2 year olds, 2 – 4 years old, 5 – 9 years and 10 – 14 years old. Based on estimates from previous findings available at the time [12, 16], such sample size provided a precision of just over 1 contact on the mean number of contacts per day, and enabled detection of a 20% difference in the average number of daily contacts by age group.
Individuals were selected from 60 clusters randomly sampled from the 215 villages and two small towns in the sub-county, with an inclusion probability proportional to the size of the village or town. Within each cluster 11 or 12 households were selected at random from a list of households. A household was defined as a group of individuals living under the same roof and sharing the same kitchen on a daily basis. One individual from each household was randomly selected from a list of predefined age groups to sample from within each cluster. When nobody in the household was from that age group, either someone from another age group was selected providing that the quota for that age group had not been reached in the cluster, or the closest neighbouring household was visited instead. In case of non-response, another attempt was made later in the day or the following Saturday. After the second attempt, the individuals were not replaced.
Data collection
Informed consent was sought for individuals aged > 17 years, and from a parent or carer for children < 18 years. In addition, assent was sought from children aged 7 – 17 years. Participants were asked to recall information on the frequency, type and duration of social encounters from the time they woke up the day before the survey until when they woke up on the survey day (~ 24 h).
We defined contacts as two-way conversational encounters lasting for ≥5 min. Participants were first asked to list all the places they had visited in the previous 24 h, the number of people they had contact with, their relationship with each individual mentioned, the age (or estimated age) of each listed contact and how long the encounter lasted for. Contacts involving skin-to-skin touch or sharing utensils passed directly from mouth-to-mouth were defined as ‘physical’ contacts. The questionnaire can be found in the Additional file 1.
We defined short contacts lasting less than 5 min as ‘casual contacts’. Participants were only asked to estimate the number of casual contacts they had, based on pre-defined categories (< 10, 10-19, 20-29, ≥30), but were not asked to provide detailed information about the nature of the encounter or the socio-demographic characteristics of the person met. Casual contacts are generally inaccurately reported in social contact surveys [7], particularly in a retrospective design, and most contacts important for the transmission of respiratory infections are believed to be close rather than casual [6].
The questionnaire was designed in English, translated to Ruyankole, the local language, and back-translated to English for consistency. For children < 5 years, parents were asked about their child’s encounters and whereabouts. Children aged 5 – 14 years were interviewed directly, using a questionnaire with a slightly adapted wording from that used for adults.
Geographical coordinates from each participant’s household and the centre of each village were taken using handheld GPS devices. The spatial identification of each location in the area was done by the research team during the preparation phase of the study. Geo-referencing of each village, hamlet or town in the area, was done using GIS imagery as well as by travelling to the different villages to collect that information using handheld GPS devices. Given that some villages had very similar names, interviewers carried with them a list of all of those (> 300), so as to avoid data entry problems.
Questionnaires completed in the field were double entered on a preformatted data entry tool (www2.voozanoo.net, Epiconcept, France) by two data managers working independently. Data entry conflicts were identified automatically and resolved as the data entry progressed.
Ethics
Approval was obtained from the Ethical review boards of Médecins Sans Frontières (MSF), the Faculty of Medicine Research & Ethics Committee of the Mbarara University of Science and Technology (MUST), the Institutional Ethical Review Board of the MUST, the Uganda National Council for Science and Technology (UNCST) and the London School of Hygiene and Tropical Medicine (LSHTM).
Analysis
Characteristics of social contacts by time, person and place
We analysed the frequency distribution of contacts for a set of covariates, including age, sex, and occupation, day of the week, distance travelled, and type of contact. Encounters reported with the same individual in different settings counted as one contact only. Straight-line distances between the centre point of all villages and towns in the dataset were calculated, and these were then used to evaluate how far people travelled, based on the reported names of villages and town where each reported encounter took place, and their own village or town of residence.
We used negative binomial regression to estimate the ratio of the mean contacts as a function of the different covariates of interest. Negative binomial was preferred over Poisson regression given evidence of over-dispersion (variance > mean, and likelihood ratio significant (P < 0.05) for the over-dispersion parameter). We considered variables associated with contact frequency at p < 0.10 for multivariable analysis, and retained them in multivariable models if they resulted in a reduction of the Bayesian Information Criterion (BIC).
Next, we explored whether people reporting a high frequency of casual contacts (≥10 casual contacts) differed from those reporting fewer contacts with regards to their socio-demographic characteristics. We did so using log-binomial regression to compute crude and adjusted relative risks (RRs) for having a high frequency. In all analyses we accounted for possible within-cluster correlation by using linearized based variance estimators [17]. Analyses were also weighted for the unequal probabilities of sampling selection by age group.
Age-specific social contact patterns
We analysed the age-specific contact patterns through matrices of the mean number of contacts between participants of age group j and individuals in age group i, adjusting for reciprocity, as in Melegaro et al. [6].
If x
ij
denotes the total number of contacts in age group i reported by individuals in age groups j, the mean number of reported contacts (m
ij
) is calculated as x
ij
/p
j
, where p
j
is the study population size of age group j. At the population level the frequency of contacts made between age groups should be equivalent such that m
ij
P
j
= m
ji
P
i
. The expected number of contacts between the two groups is thereforeC
ij
= (m
ij
P
j
+ m
ji
P
i
)/2. Hence, the mean number of contacts corrected for reciprocity \( \left({m}_{ij}^C\right) \) can be expressed as C
ij
/P
j
.
We tested the null hypothesis of proportionate mixing by computing the ratio of observed mixing patterns to that of expected mixing patterns if social contact occurred at random. Under the assumption of random mixing, the probability of encounter between age groups thus depends on the population distribution in each age group, and the contact matrix under this random mixing hypothesis was calculated based on the percentage of population in each age group. The ratio of observed over expected contacts was then computed, and confidence intervals were obtained through bootstrapping, with replacement, for a total of 1000 iterations. This approach is similar to that taken by others [11].
Epidemic simulations
Finally, in order to explore the infection transmission dynamics resulting from our contact pattern data, we simulated the spread of an immunizing respiratory infection transmitted through close contact in a totally susceptible population, thus assuming a Susceptible-Infected-Recovered (SIR) model. The model contained nine mixing age groups, with a transmission rate β
ij
at which individuals in age group jcome into routine contact with individuals in age group i computed as \( {\beta}_{ij}={qm}_{ij}^C/{\omega}_i \), where ω
i
is the proportion of individuals in age group i, and \( {qm}_{ij}^C \) is the next generation matrix, with qrepresenting the probability of successful transmission per contact event [18]. We assumed q to be homogeneous and constant across all age groups and conducted a set of simulations for fixed values of q between 25% and 40%, in line with what has been reported with influenza pandemic strains [18, 19]. The basic reproduction number (R0) – which corresponds to the average number of people infected by one infectious individual in a totally susceptible population – was calculated as the dominant eigenvalue of the next generation matrix. We took uncertainty estimates in the contact matrices (and hence final size outputs) into account by iterating the model on bootstrapped matrices.
We did not develop a model to run simulations to model the dynamic of the epidemic in each setting. Rather, we computed the final epidemic size (i.e. the number of individuals who would have been infected during the epidemic) for each specific age group, based on simple a mass action model adapted to account for multiple age classes, as described in Kucharski et al. [20], with the following equation:
$$ {F}_i=1-\exp \left(-\sum \limits_{j=1}^9{\beta}_{ij}{\omega}_j{F}_j\right) $$
,where F represents the final epidemic size by age group (i.e. the proportion of individuals who are infected in each age group).
Estimates obtained using the contact data from Uganda were compared to that of Great Britain, using data from the POLYMOD study [4] for the latter and a similar approach to compute the mixing matrix. The model was parameterised with social contact data on physical contacts only, lasting ≥5 min, rather than all contacts, given that physical contacts generally seem to better capture contact structures relevant for the transmission of respiratory infections [6]. Data for Great Britain were available for the same age groups, and for physical contacts specifically, in the same way that physical contacts were defined in our study, which made the data for physical contacts only more comparable between studies than that of overall contacts.
Analyses were performed in either Stata13.1 IC or R version 3.2 [21].