Both sensor-measured contacts and self-reported contacts were distinct sets with a large intersection, but none was an entire subset of the other. Using the diary reports, we estimated reporting probabilities between 72 and 94 %, with lower probabilities for shorter contacts, consistent with previous studies [20–22]. When comparing both contact measurement methods, we obtained a substantial overlap in reporting and recording, but also important differences, with in each case contacts reported or recorded by one method but not the other. Relating diary reports to sensor recordings, we found reporting probabilities between 34 and 94 %, again with lower probabilities for shorter contacts, and relating sensor recordings to diary reports, we estimated recording probabilities between 57 and 95 %. Both methods of contact measurement were acceptable to the participants in our specific setting: Only 20 % of respondents with at least one reported contact stated that filling in the diary was too much work and 25 % reported difficulties in remembering contacts. Almost all participants (93 %) were comfortable having their conference contacts measured by a wearable sensor.
Differences between diary and sensor data
Differences between diary and sensor contact data can be classified into three different categories: (i) legitimate differences due to different underlying contact definitions, (ii) differences due to limited receiver coverage, (iii) differences due to reporting and measurement errors.
Differences in contact definitions
The proxy for a potentially contagious contact in contact diaries is social interaction, i.e., conversation and/or physical contact. The definition underlying sensor measurements is broader in a sense that every short-distance, face-to-face collocation of two participants was recorded. It is likely that in our study sensor-recorded face-to-face events without conversation or physical contact occurred that were legitimately not reported in the diaries. Conversely, participants might have reported conversations that were not clearly face-to-face but angled or even side-to-side, or over a distance of more than ~1.5 m, which were then not recorded by the sensors.
Under- and overrecording with sensors
The SocioPatterns infrastructure used here required stationary receivers to store contact data (unlike, e.g., TelosB motes , which store contact data directly on the sensors). As a consequence, only contacts that took place in an area covered by at least one receiver were recorded. The positioning of receivers is often - and was in our study - constrained by health-and-safety requirements (e.g., no exposed wires someone could trip over), the availability of power sockets, the arrangement and orientation of walls, etc. Blind spots in the study area due to these constraints might have been present. Furthermore, we only covered the rooms, lecture theatres, and open spaces used for the conference, but we did not cover the canteen or outdoor spaces (see Additional file 13: Figure S1). It is highly likely that some contacts took place there, too.
Underrecording can also happen when participants do not adhere to the instructions and place their sensor under a jacket or at another inappropriate place that shields electromagnetic signals. We observed, e.g., one participant placing his sensor in a backpack. These reasons as well as reporting of angled or distant conversations might explain why about 42 % of short contacts (less than 5 min) that were reported were not recorded by the sensors.
Overrecording happened during the recruitment phase, when activated sensors had to be distributed to many participants within very short time. We attempted to remove these artefactual recordings by filtering the raw data (see section “Processing and cleaning of the sensor data”).
Taken together, this highlights the fact that using sensors for contact measurements is not fully free of measurement errors.
Under- and overreporting with contact diaries
It is generally assumed that discordant reporting is due to underreporting and that no contacts are fabricated [20, 22, 26]. As expected, we found differences in reporting quality depending on duration of contact, with short contacts showing relatively more discordant reports than extended ones.
Our estimates of reporting probabilities were based on three assumptions, one of which was the independence of reporting. Unlike in previous studies, where the reporting was retrospective [20, 21] or well-integrated into the daily routine , participants in this study were branded by wearing the sensor pouch with their visible ID number and the study participation itself could have triggered conversation among participants. Hence, it is likely that the independence assumption was partly violated and concordant reports (in either way: both reporting or both not reporting) might have been overrepresented. As a consequence, true reporting probabilities might be lower than estimated here.
Of note, our results confirm two important facts: on the one hand, participants forget to report a proportion of – especially short – contacts [20–22]; on the other hand, there seems to be a tendency to overestimate contact duration if contacts are reported (see Fig. 2, and Mastrandrea et al. ). Indeed, diary-based estimation of individual strength are larger than the corresponding measure based on sensor data for about 60 % of participants, even if very strict definitions concerning the reported strength are applied (see method section). A possible reason for this discrepancy in duration estimates could be that durations of contacts as measured by sensors might not meet assumptions of participants about contacts, i.e., they might take the whole duration of a meeting as a contact and not brief periods with face-to-face contacts, leading to overreporting of durations. Another reason could be due to underrecording by the sensors as discussed above. We suppose that the latter factor does not explain a major part of the observed difference. Diary-based strength smaller than sensor-based strength was obtained only for few participants; however, given 5/6 of these participants did not report any contact, they had no real opportunity to bias their contact duration.
The majority of participants, nearly 80 %, thought that their reporting concerning the number of contacts was correct. This self-evaluation confirms findings from a previous study: in McCaw et al. most participants of this three-day paper diary study thought that they got the number of contacts right.  However, taking actual reporting errors into account, the comparison shows that even scientifically trained participants are subject to misperceptions (72 % reporting probability for short contacts).
A problem for contact studies is that measuring behaviour can influence it: about one quarter of our study participants reported more frequent and longer contact events that they attributed to the study, although in longer studies, e.g., over a week, this effect could wear off and it might be of less impact in studies without visible ID.
Comparison of contact reporting with previous studies
The crude overall concordance of contact reporting, i.e., contacts that were reported by both parties, we observed in our study was at 68.8 % considerably higher than in previous studies conducted at a university (30.2 %), a US high school (23.5 %), and a French high school (44.9 %) [4, 20, 21] and similar to the one of a study done at a research institute over one week (65.0 %) . Interestingly, Conlan et al. reported a concordance of 61 % from a contact study investigating social network primary school students aged 4 to 11 years ; however, in this study contact networks were built on general reports from children about their social network and were not specific to a given study period. Also, degree was a priori limited by questionnaire design. Both characteristics might contribute to high mutual reporting and render direct comparison with our study not feasible.
Stratifying for duration, the reporting probabilities in our study are qualitatively in line with previous studies. However, in Smieszek et al. , participants reported short contacts with a probability of 49.0 % compared to 72.2 % in our study, despite having a high reporting probability in other duration categories. One reason might be that the 2012 study asked the participants to report contacts over several days, which increases the overall workload of participants, and that there were more contacts to be reported. In Smieszek et al. 2014 , the reporting probabilities of all duration categories were considerably lower - with the exception of contacts >60 min - which might be due to the retrospective study design, and the particular group of study participants (high school students). Finally, Mastrandrea et al.  found 40 % reporting probability for contacts <5 min, and 72 % for contacts >60 min.
Reasons for the particularly high reporting probabilities in our study compared to previous ones could be: the low density of study participants among conference attendees and the fact that a very structured conference day is associated with a low workload; the study duration was also comparatively short (4 to 8 hours); moreover, the study population consisted mostly of scientists, who are familiar with filling in questionnaires and who are probably intrinsically motivated to produce data of good quality.
Interestingly, in our study the reporting by men was not lower than that of women, in contrast to findings from a previous study  where the reporting by female study participants was about twice as high as that of males. However, in that previous study, gender specific results might be specific to the age of most of the participants (teenagers) and hence not necessarily generalizable.
Comparing the correlations between reported and recorded degree with the results from Smieszek et al. 2014 , again, we found better correspondence between the data from both methods (Smieszek et al. 2014: τ = 0.01 (all contacts), τ = 0.14 (contacts ≥5 min), τ = 0.21 (contacts ≥15 min) with corresponding values of 0.29, 0.50 and 0.37. Nonetheless, the correlation between the degrees reported and recorded are still weak for all three combinations of contact duration.
Implications of study findings for modelling
We found, in line with previous research, that the discrepancies within survey data as well as between sensor- and survey-data are highest in contacts of short duration. Several studies found that only mixing matrices of contacts ≥15 min could explain population-wide seroconversion of a multitude of infections better than, e.g., random mixing models [30, 37, 38]. If only contacts of a minimal duration of 15 min are included in infection transmission models, then the absolute number of discrepancies decreases substantially. If we believe short contacts not to be relevant then the high proportion of reporting errors in contacts of short duration might be not as problematic as it appears. However, all published evidence is based on self-reported contact data; it is very likely that the reporting of short contacts in those studies was also poor, which implies that the findings might be biased.
Even though non-intense and short contacts most likely play a role in infection transmission, the transmission probability per contact for such contacts can be expected to be substantially lower than for extended contacts for most infectious diseases [6, 30, 32, 37–39], which is – besides perhaps poor data quality – most likely one of the reasons why no study has found short contacts to be explanatory for serological survey data.
We also found that the correction for pure degree heterogeneity results in higher R
0 increases with respect to a homogeneous mixing R
0 hypothesis for contacts ≥15 min than for all (short and long) contacts: in other words, contact data including short and long duration contacts were found to behave closer to random mixing [16, 36] than networks containing only contacts of longer duration, which are much sparser and more structured. Of note, our diary data are expected to produce lower R
0 values than the sensor data (for identical transmissibility and infection duration) if all contacts are considered, but much higher R
0 for contacts ≥15 min only. The underlying reason for this, i.e., the situation that shorter contacts are reported less and longer contacts are reported slightly more often than recorded, is also reflected in the Bland-Altman plots (Fig. 1) and may be due to the fact that the durations of contacts were estimated to be longer than the corresponding recording of duration by sensors, as discussed above.
When interpreting participants’ attitudes and assessments we have to bear in mind the specific setting the study took place in, i.e., a semi-public space where aspects of privacy, social control and familiarity with other people might be different both from completely public or private spaces. Privacy regarding both reporting contacts in a diary and being recorded with a digital device worried only a minority of respondents in our setting. The same applies for being tagged with the visible ID. Consistent with this, only a very small minority of participants would like to be able to turn off the sensor. In all, respondents had a very relaxed attitude towards reporting and being reported in our specific setting. Apart from aspects of social control (that are already in place at a conference setting and cause contact measurements not to be an additional control), it might be the case that our participants – mostly German epidemiologists – know and trust the national data protection laws. Hence, issues of privacy might play a more prominent role in different study settings or populations.
In planning future studies, one should take into account that only ca. 50 % of respondents said they would participate in a year-long study with weekly recording for 1 day over the period of one year. This leads to the assumption that network studies can be conducted successfully only over a couple of weeks and/or in social contexts with a strong social cohesion (which might limit generalizability) or need to provide an effective way to motivate potential participants [21, 23, 40]. Unfortunately, social dynamics can also play a negative role and may decrease compliance . Thus, it might be difficult to recruit people into long-term contact studies; however, a majority of those who participated did not object to using diaries or sensors and expressed mostly neutral to positive sentiments. Furthermore, we might want to consider when planning future studies that about 20 % of the participants thought that filling in the diary was too much work; this was also the main concern of non-participants (data not shown). This sentiment leads to the hypothesis that contact studies using digital devices would have a higher response proportion than diary-based ones, as indeed observed in .
Compared to literature on attitudes towards contact diaries, our findings are similar. In the study by Beutels et al. about two-thirds of the participants of a two-day paper-based contact diary study associated “little effort” with the diary and in McCaw et al. nearly two-thirds of the participants rated filling in a diary as easy [5, 8]: in our study, nearly 90 % evaluated filling in the diary as easy. In addition, one other issue has been already addressed by a previous study: in Beutels et al. about 60 % of study participants saw (nearly) no change of number or kind of contacts by keeping the diary ; in our study this corresponds to 74 % with the lower limit of the confidence intervall (61 %) conveying no relevant departure from the previous finding.
Wearable sensors record contacts only among people that are equipped with them, meaning that only contacts within clearly defined groups can be measured. While diaries can be distributed to, e.g., a random sample of any population , any approach for assessing reporting errors from diary data also requires clearly defined participant groups. This necessary restriction limits the generalizability of studies’ results: Different groups most likely vary in cognitive capability or motivation. Furthermore, the difficulty of the reporting task differs between study settings. Our study had a low participant density (about 30 % of all conference participants and a median degree of 4) so that the corresponding low workload might have resulted in fewer errors. Also, some items of the acceptability survey might be positively influenced by the low workload, e.g., easiness of filling in the diary or remembering contacts.
Further, it is likely that self-selection biased responses on acceptability towards more positive answers: conference attendees who a priori deemed the study too demanding or unappealing were probably more likely to refuse participation than others with positive sentiments towards the study. Selection bias might not only have influenced study results by deterring conference attendees with a high expected number of contact events, but also shy attendees or people who are unfamiliar with the community. Whereas the first might lead to an underestimation of reporting errors, the latter might result in the opposite effect. Finally, regarding statistical differences between or among groups, our study might have not enough power to detect existing differences.