We analyzed data from Diagnostic II, the second of two retrospective cohort studies conducted in Kampala, Uganda. This second study expanded on the methods of Diagnostic I, described previously [5].
Ethical considerations
Written informed consent was obtained from all eligible participants. The study was approved by institutional review boards at the University of Georgia, Makerere University School of Public Health, and the Uganda National Council for Science and Technology. All methods were carried out in accordance with relevant human subjects guidelines and regulations.
Study design, setting, and population
We conducted a retrospective cohort study among TB patients from January to November 2017. Participants were recruited at two public TB clinics located in Lubaga Division, and within 5–10 km of Kampala, Uganda’s capital city. The clinics are part of the government-funded public health system run by the Kampala Capital City Authority. Primary health care services, diagnosis and treatment of TB and other health conditions are provided free of charge. The estimated catchment population of the public clinics in Lubaga division is 400,000 persons. Additional health facility census information for the study area in 2017 is available from the United States Agency for International Development [27]. Eligible patients were consenting adults, eighteen years or older, who had been diagnosed with active pulmonary tuberculosis and who had initiated treatment within three months of the interview date. Participants were recruited at variable times after diagnosis and were interviewed to collect retrospective information on time of seeking care before diagnosis; this approach was previously deemed a suitable alternative to prospective cohort studies [28].
Data collection and management
Data were collected in face-to-face interviews by trained interviewers using a structured questionnaire (available via our Github repository). The questionnaire was developed by a team of physicians, with expertise in TB, and epidemiologists. The original questionnaire used in our first study, Diagnostic I, was tested in a pilot study for accuracy, comprehension, and consistency of responses, with satisfactory results [5]. For Diagnostic II, the questionnaire was expanded to include items about participant knowledge about TB symptoms, experiences with and concerns about TB symptoms, prompts to seek care, and costs of reaching or obtaining health care. These variables were additions to the original items on HIV status, time of TB diagnosis, time of onset of symptoms, and duration of symptoms, as well as the detailed information about contacts made while seeking care. The complete list of variables is included as supplemental material.
Data were collected using standardized teleforms and scanned into a database using optical scanning software (TeleForms®). We preprocessed the raw data and engineered summary or comprehensive factors relevant to the analysis when applicable. All numeric variables were standardized—centered and scaled. All code and additional details are available as supplementary materials.
Descriptive analysis
We calculated community contact delay as the time from first seeking care to first contacting a qualified TB provider. Qualified TB providers included government hospitals, government health centers, private hospitals, or other locations with TB diagnostic services.
For the analysis of these community delays, contacts were divided into two categories: social contacts and non-TB providers. Social contacts included spouses, parents, children, siblings, other relatives, coworkers, friends, and neighbors. Non-TB providers included herbal healers, drug stores, private clinics, or village health workers. The time contributed to a patient’s pathway was decomposed into steps between contacts, and each window of time was considered related to the most recent contact. In this way, the total community contact delay could be divided into the times specific to visits to contacts in each category. We calculated additional measures including the number and fraction of community network contacts in each category, as well as the total number of contacts and the total amount of time spent visiting contacts.
The outcome of interest was total community contact delay. As visits to non-TB providers were significant in the Diagnostic I study [5], a secondary analysis was included to explore factors associated with the number of community contact delay days spent in visits to non-TB providers.
Statistical analysis
We fit linear regression models with each predictor individually, to investigate bivariate associations with community contact delay. Similarly, we fit bivariate regression models for each predictor for our secondary analysis, investigating the delay spent contacting non-TB providers.
Two final linear models were fit with Least Absolute Shrinkage and Selection Operator (LASSO) regularization and 10-fold cross validation—one each for the outcomes of (1) community contact delay and (2) the contribution of non-TB provider visits to community contact delay. The distribution of the residuals for full linear models with all predictors showed some skewness. Neither a log-transformation of the outcome nor use of Poisson distribution models improved the minor skew (see supplementary material), and linear regression was maintained for the final LASSO models. All analyses were conducted in R software (version 3.6.1) [29].