Monitoring HIV testing and pre-exposure prophylaxis information seeking by combining digital and traditional data
BMC Infectious Diseases volume 21, Article number: 215 (2021)
Public health is increasingly turning to non-traditional digital data to inform HIV prevention and control strategies. We demonstrate a parsimonious method using both traditional survey and internet search histories to provide new insights into HIV testing and pre-exposure prophylaxis (PrEP) information seeking that can be easily extended to other settings.
We modeled how US internet search volumes from 2019 for HIV testing and PrEP compared against expected search volumes for HIV testing and PrEP using state HIV prevalence and socioeconomic characteristics as predictors. States with search volumes outside the upper and lower bound confidence interval were labeled as either over or under performing. State performance was evaluated by (a) Centers for Disease Control and Prevention designation as a hotspot for new HIV diagnoses (b) expanding Medicaid coverage.
Ten states over-performed in models assessing information seeking for HIV testing, while eleven states under-performed. Thirteen states over-performed in models assessing internet searches for PrEP information, while thirteen states under-performed. States that expanded Medicaid coverage were more likely to over perform in PrEP models than states that did not expand Medicaid coverage. While states that were hotspots for new HIV diagnoses were more likely to over perform on HIV testing searches.
Our study derived a method of measuring HIV and PrEP information seeking that is comparable across states. Several states exhibited information seeking for PrEP and HIV testing that deviated from model assessments. Statewide search volume for PrEP information was affected by a state’s decision to expand Medicaid coverage. Our research provides health officials with an innovative way to monitor statewide interest in PrEP and HIV testing using a metric for information-seeking that is comparable across states.
As people increasingly turn to digital sources of news and information, online activity has the potential to become a window into the public’s consciousness . Measuring the public’s online information seeking has the potential to predict health behavior, as what people are searching the internet for can be predictive of what they intend to do in the future . It is possible that seeking information about HIV testing and Pre-Exposure Prophylaxis (PrEP) online could be a new surveillance tool in the fight against HIV. Previous studies have shown a spike in internet searches for HIV testing has corresponded with increases in HIV testing, suggesting that seeking HIV testing information online could be predictive of testing behavior . Utilizing internet searches could be a way to enhance the surveillance of the public’s interest in seeking information on HIV and HIV health seeking behavior. Past efforts to enhance HIV surveillance relied mostly on upscaling traditional data (e.g., clinical records or surveys) that have intrinsic shortcomings, such as a limited ability to provide current information. For instance, the most recent data for HIV testing on AIDSvu.org is from 2016 and the most recent AIDSvu.org data on PrEP usage is from 2018 . These limitations have driven public health to increasingly turn to digital data, such as news, social media, and internet searches, to learn how people seek HIV information [5,6,7,8]. For example, internet search trends can be used to investigate public interests as evident by actor Charlie Sheen’s HIV positive disclosure concurring with record levels of Google searches for HIV awareness, HIV testing, and condoms . This finding was valid, as it was later confirmed by traditional data after a 16 month delay . Internet search histories have potential utility for assessing both help-seeking behavior regarding public interest in PrEP for HIV prevention and for HIV testing. For example, one study conducted in Hong Kong found that a direct relation between HIV news trends and online search behavior for issues regarding HIV/AIDS and men who have sex with men (MSM) . Other studies have found that areas with high levels of HIV prevalence have greater internet search volumes for HIV related terms then areas of low HIV prevalence . These studies show that the use of internet search histories combined with traditional surveillance data has the potential to create synergies that can yield new insights into HIV related health behavior.
Our study methods use both internet search histories and traditional survey data to provide new insights on information seeking for HIV testing and PrEP information that can be easily replicated and extended to other settings and outcomes. Specifically, we predicted expected internet search volumes for HIV testing and PrEP based on statewide HIV prevalence and socioeconomic (SES) factors and compared them to observed search volumes in a model that allows us to identify if US states over or under perform against expectations. Moreover, we evaluated how state performance varied by (a) states that are designated as hotspots for new HIV diagnoses (b) states that received Medicaid expansion funding.
Our study used data from multiple sources. 1) We obtained the most current state-level prevalence of HIV from the Center for Disease Control and Prevention HIV surveillance report , which is from 2018. HIV prevalence was chosen over HIV incidence because we were interested in look at the association between the total number of HIV cases in a state and internet searches for HIV testing and PrEP 2) The following state-level socioeconomic attributes we obtained from the 2018 Center for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS): proportion of males, white non-Hispanics, people aged 45 years or older, and people with household income over $50,000 . 3) We obtained 2019 state-level annual internet search volumes for HIV testing and PrEP from using the Google Trends API. Information available through this API includes the volume of searches for each term, the number of searches per unit of time, and the geographic location of the searches (country, region, state, city, metropolitan area). Search volume data was calculated as a query fraction of the proportion of searches of a specific search term relative to all searches measured per 10 million searches. Standardizing search volumes was done in order to account for population sizes. We defined HIV testing searches as any query that included the terms “HIV” and “test,” “tests,” or “testing”, “AIDS test”, or “oraquick”. We defined PrEP searches as any query that included the terms “PrEP” and “HIV” or “pre-exposure prophylaxis HIV” or “Truvada” or “Descovy”.
Internet search volumes are withheld by Google for states where searches do not achieve a minimum threshold of searches. As a result, we could not obtain search data for HIV testing for five states (Alaska, Montana, South Dakota, Vermont, and Wyoming). PrEP search data could not be obtained for two states (Vermont and Wyoming). 4) We obtained data on states that expanded Medicaid coverage from the Kaiser Family Foundation .
Our analysis followed a four-step process. First, we fit Poisson regression models with state-level HIV prevalence data and state-level socioeconomic attributes to predict the expected internet search volumes of HIV testing and PrEP for each state. Second, we fit a centered least squares regression line of expected search volumes from our models versus observed search volumes from Google Trends. Third, we compared the expected search volumes from our models in step one with the observed search volumes from Google Trends for each state in order to assess the level of information seeking for HIV testing and PrEP by calculating the percent difference between the observed and expected values of the centered least squares regression (i.e., (observed-fitted)/fitted * 100%).
States with observed information seeking (measured by observed internet search volumes for HIV testing and PrEP) greater than their expected information seeking (predicted internet search volumes by the Poisson regression model) were considered to be over performing and exhibit greater information seeking for HIV testing and PrEP than expected given their prevalence of HIV. States that were over performing above the 95% confidence interval were highlighted in our results (see example of plotting observed vs. expected observations in Fig. 1). Similarly, states with observed information seeking less than their expected information seeking were considered to be underperforming and exhibit less information seeking for HIV testing and/or PrEP than expected given their prevalence of HIV. States that were under performing below the 95% confidence interval were highlighted in our results To describe statistical uncertainty between expected and observed search volumes, we used bootstrap sampling to calculate the 95% confidence interval (CI) for the regression line and labeled observations outside the confidence band as states that over or under performed.
Fourth, to understand which states typically under or over performed we contrasted the deviations in expected searches against (a) What states were designated as hotspots for new HIV diagnoses by the Centers for Disease Control and Prevention (CDC) , (b) What states received Medicaid expansion funding that covered HIV testing and PrEP .
We observed different levels of information seeking for HIV testing and PrEP across states (Fig. 1). Ten states over-performed for HIV testing searches. Georgia exhibited the greatest difference with 36.8% more searches than expected followed closely by Rhode Island (35.2%), then Indiana (28.8%), Pennsylvania (22.9%), Nevada (18.6%), Florida (18.0%), Louisiana (17.3%), Washington (15.3%), Iowa (13.4%), and Virginia (10.0%) (Table 1). Conversely, eleven states under-performed for HIV testing. New Hampshire exhibited the greatest difference with − 34.1% less searches than expected, followed by Maine (− 32.2%), Idaho (− 26.1%), Nebraska (− 21.9%), Oregon (− 20.5%), New Mexico (− 18.5%), Mississippi (− 16.8%), Arkansas (− 15.7%), Alabama (− 13.6%), Massachusetts (− 12.4%), Arizona (− 10.3%),
Thirteen states over-performed for PrEP searches (Table 1). West Virginia exhibited the greatest difference with 30.7% more searches than expected, followed by Rhode Island (29.2%), Massachusetts (23.6%), Washington (19.9%), Nevada (18.0%), Pennsylvania (17.3%), Nebraska (16.9%), Arizona (14.1%), Illinois (12.6%), Oregon (12.4%), New York (12.1%), Colorado (10.7%), and Kentucky (10.2%). Conversely, thirteen states under-performed for PrEP searches. Idaho exhibited the greatest difference with 30.9.% less searches than expected, followed by Montana (− 28.2%), Wisconsin (− 27.9%), New Hampshire (− 23.6%), South Carolina (− 23.1%), Iowa (− 17.2%), New Jersey (− 15.9%), Oklahoma (− 14.1%), Kansas (− 13.3%), Michigan (− 9.4%), North Carolina (− 8.4%), Delaware (− 7.7%), and Missouri (− 6.1%).
States that over or under performed on HIV testing searches did not necessarily do likewise for PrEP searches (r = 0.12). For instance, Nebraska ranked 7th for excess HIV testing searches, but then ranked 43rd for PrEP searches. Four states (Washington, Nevada, Pennsylvania, and Rhode Island) over-performed for both HIV testing and PrEP searches, while only 2 states (New Hampshire and Idaho) under-performed for both. States that expanded Medicaid coverage were more likely to over perform more on PrEP searches compared to states that did not expand Medicaid coverage (z = 2.04, p < 0.041). States that were hotspots for new HIV diagnoses were more likely to over perform on HIV testing searches than states that were not hotspots for new HIV diagnoses (z = 2.08, p < 0.037).
Our study derived a method of measuring HIV testing and PrEP information seeking that is comparable across states. Several states exhibited information seeking for PrEP and HIV testing that deviated from what was expected in our models. A state’s performance in our models was not affected by its designation as a hotspot for new HIV infections. However, performance for PrEP information seeking was associated with a state’s decision to expand Medicaid coverage. By integrating internet search histories and traditional survey data, our results provide baseline benchmarks for monitoring statewide interest in seeking information on HIV testing and PrEP.
Our research demonstrates a need for increased access to PrEP information, particularly among states that have not expanded their Medicaid coverage. Lower interest in seeking information on PrEP for states that did not expand Medicaid coverage could be detrimental to increasing PrEP utilization given that insurance coverage affects PrEP uptake [14, 15]. Approximately 12% of PrEP users receive PrEP through Medicaid  and the refusal to extend coverage could deny people the ability to access PrEP. Our results, coupled with the inability to utilize PrEP due to a lack of health insurance, is a potentially disastrous combination that could result in an increase in HIV prevalence in states that underperformed in our PrEP models.
Underperformance in PrEP models could be due to the unequal distribution of PrEP across different genders, ages, and states. Our models control for age, sex, race, and income at the state level using BRFSS data. However, our models do not adjust for disparities in the distribution of PrEP. It is possible that PrEP interventions that do not specifically target key populations with indications for PrEP use could result in these neglected populations not searching for PrEP information on line, which would result in an underperformance in our PrEP models. For example, five states represented 50% of PrEP prescriptions and although women represent almost 20% of new HIV infections, they represented only 7% of PrEP prescriptions [16, 17]. These types of underlying disparities in PrEP distribution could possibly be factors influencing how people look for information on PrEP.
Our research suggest the possibility that increased attention to HIV testing, promoted by a state being listed as a CDC hotspot for new HIV diagnoses, does in fact result in increased public interest in seeking HIV testing information . States that are listed as a hotspot for new HIV infections receive a rapid infusion of additional resources, expertise, and technology to develop and implement locally tailored HIV interventions . It is possible that the increased promotion of HIV interventions results in more public interest in seeking HIV testing information. This would explain why states that were listed as hotspots for new HIV diagnoses were more likely to over perform on HIV testing searches than states that were not hotspots. Our results support providing states with more resources to promote HIV testing, given that our models suggest increases in searches for HIV testing are correlated with more CDC support for HIV programs.
Our study benefits from several strengths. We use a nationally representative survey to control for several SES covariates, ensuring that the US population is accurately represented. Because our methods adjusted for baseline state level SES characteristics, leaders in each individual state can use these methods to evaluate their state-specific progress. Our internet search volume data is measured in real-time, and while we used annual estimates, it is possible to use the same method to estimate weekly or monthly search volumes. Most importantly, our research presents a new method for surveillance and performance monitoring in HIV prevention.
Our research is not without limitations. Internet search volume data is aggregated and is susceptible to ecological confounding. Additionally, it cannot be used to determine which racial/ethnic, gender, or age groups are or are not engaged with HIV testing or PrEP. While it is possible that adding more search terms could affect our results, the effects of adding additional search terms to our models diminishes after the most common search terms are added, as these terms make up the vast majority of search terms that the public uses. To insure we were using the most common search terms for HIV testing and PrEP, we consulted with HIV experts on HIV testing and PrEP nomenclature. Using search data may be subject to selection bias, as not all people access the Internet equally and although some queries may reflect general curiosity rather than treatment-seeking, it is well known that internet search trends mirror many health-related behaviors .
Our results are a call-to-action for underperforming states whose populations are not engaged in searching for information on HIV testing and PrEP. Our research provides health officials with an innovative way to monitor statewide interest in PrEP and HIV testing by highlighting the states that demonstrate the least online information seeking, which is critical for the promotion of HIV testing and PrEP as a way to help end the HIV epidemic. Further research should examine why certain states are deficient, and policy makers in deficient states should make efforts to expand HIV testing and PrEP promotion, perhaps by replicating the interventions and policies of better-performing states.
Availability of data and materials
All data is publically available through Google Trends and through The Behavioral Risk Factor Surveillance System (BRFSS).
- AIDS :
acquired immunodeficiency syndrome
- CDC :
Centers for Disease Control and Prevention
- CI :
- HIV :
human immunodeficiency virus
- PrEP :
- SES :
Asur S, Huberman BA. Predicting the future with social media. In: Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010. ; 2010. doi:https://doi.org/10.1109/WI-IAT.2010.63
Goel S, Hofman JM, Lahaie S, Pennock DM, Watts DJ. Predicting consumer behavior with web search. Proc Natl Acad Sci U S A. 2010. https://doi.org/10.1073/pnas.1005962107.
Ayers JW, Althouse BM, Dredze M, Leas EC, Noar SM. News and internet searches about human immunodeficiency virus after Charlie Sheen’s disclosure. JAMA Intern Med. 2016. https://doi.org/10.1001/jamainternmed.2016.0003.
AIDSVu (aidsvu.org). Emory University, Rollins School of Public Health. Accessed 10 Mar 2020.
Ayers JW, Althouse BM, Dredze M. Could behavioral medicine lead the web data revolution? JAMA. 2014. https://doi.org/10.1001/jama.2014.1505.
Nobles AL, Leas EC, Latkin CA, Dredze M, Strathdee SA, Ayers JW. #HIV: Alignment of HIV-Related Visual Content on Instagram with Public Health Priorities in the US. AIDS Behav. 2020. doi:https://doi.org/10.1007/s10461-019-02765-5
Young SD, Yu W, Wang W. Toward automating HIV identification: machine learning for rapid identification of HIV-related social media data. J Acquir Immune Defic Syndr. 2017. https://doi.org/10.1097/QAI.0000000000001240.
Allem JP, Leas EC, Caputi TL, et al. The Charlie sheen effect on rapid in-home human immunodeficiency virus test sales. Prev Sci. 2017. https://doi.org/10.1007/s11121-017-0792-2.
Chiu, A. P. Y., Lin, Q., & He, D. (2017). News trends and web search query of HIV/AIDS in Hong Kong. PLoS One https://doi.org/https://doi.org/10.1371/journal.pone.0185004.
Mahroum, N., Bragazzi, N. L., Brigo, F., Waknin, R., Sharif, K., Mahagna, H., Amital, H., & Watad, A. (2019). Capturing public interest toward new tools for controlling human immunodeficiency virus (HIV) infection exploiting data from Google trends. Health Informatics Journal https://doi.org/https://doi.org/10.1177/1460458218766573.
Department of Health and Human Services, Centers for Disease Control and Prevention. HIV Surveillance Report 2020. https://www.cdc.gov/hiv/library/reports/hiv-surveillance.html.
Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Data. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2018.
Kaiser Family Foundation. “Status of State Medicaid Expansion Decisions: Interactive Map”. https://www.kff.org/medicaid/issue-brief/status-of-state-medicaid-expansion-decisions-interactive-map/. Accessed April 1st, 2020.
Patel RR, Mena L, Nunn A, et al. Impact of insurance coverage on utilization of pre-exposure prophylaxis for HIV prevention. PLoS One. 2017. https://doi.org/10.1371/journal.pone.0178737.
Doblecki-Lewis S, Liu A, Feaster D, et al. Healthcare access and PrEP continuation in San Francisco and Miami after the US PrEP demo project. In: Journal of Acquired Immune Deficiency Syndromes. ; 2017. doi:https://doi.org/10.1097/QAI.0000000000001236.
Huang YLA, Zhu W, Smith DK, Harris N, Hoover KW. Hiv preexposure prophylaxis, by race and ethnicity — United States, 2014–2016. Morb Mortal Wkly Rep. 2018. doi:https://doi.org/10.15585/MMWR.MM6741A3
AIDSVu (aidsvu.org). Mapping PrEP, First Ever Data on PrEP Users Across the U.S. Emory University, Rollins School of Public Health. https://aidsvu.org/prep/. Accessed 10 Mar 2020.
U.S. Department of Health and Human Services. 2021. HIV National Strategic Plaen for the United States: a roadmap to end the epidemic 2021–2025. Washington, DC.
The content of this research is solely the responsibility of the authors and does not necessarily represent the official views of the California HIV/AIDS Research Program Office or National Institute of Drug Abuse.
Authors’ contributions (last name of each author is listed under what they contributed to)
Study Conception and Design
Johnson, Ayers, Nobles, Leas
Acquisition and Preparation of Data
Johnson, Caputi, Liu
Analysis and Interpretation of Data
Johnson, Nobles, Strathdee, Smith
Drafting of Manuscript
Johnson, Nobles, Caputi, Liu, Leas, Strathdee, Smith, Ayers
Johnson, Ayers, Nobles, Leas
All authors read and approved the final manuscript.
This research was supported by funds from the California HIV/AIDS Research Program Office of the University of California (OS17-SD-001) and the National Institute of Drug Abuse (T32 DA023356, R37 DA019829).
Ethics approval and consent to participate
Our research was exempted from an ethics review by the University of California at San Diego Human Research Protections Program.
None of the authors declares any conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Johnson, D.C., Nobles, A.L., Caputi, T.L. et al. Monitoring HIV testing and pre-exposure prophylaxis information seeking by combining digital and traditional data. BMC Infect Dis 21, 215 (2021). https://doi.org/10.1186/s12879-021-05907-0
- Google trends
- HIV testing