Estimating Hidden Populations of HIV With an Advanced Capture-Recapture Model

The Targeted Minimum Loss-based Estimation proved the most accurate capture-recapture model in estimating hidden populations of patients with HIV.

A capture-recapture model, named the Targeted Minimum Loss-based Estimation (TMLE), was found to the most accurate and precise of similar models in estimating hidden populations of patients with HIV, according to a study published in the American Journal of Epidemiology. Modern statistical methods can be used in these models for more accurate estimations.

Rates of disease burden and acquisition as well as setting appropriate public health priorities and policies can be calculated by estimating the true denominator of a population at risk. Capture-recapture models are used to estimate the denominator for a population at risk. Although capture-recapture models have limitations, they are still popular in epidemiology. This study aimed assess the accuracy of the TMLE by using data from the HIV surveillance registry from the San Francisco Department of Public Health (SFDPH).

People in blank white T-shirts holding AIDS awareness red ribbons. | Image credit: LIGHTFIELD STUDIOS -

People in blank white T-shirts holding AIDS awareness red ribbons. | Image credit: LIGHTFIELD STUDIOS -

The model aimed to estimate the number of residents living with HIV in San Francisco through December 31, 2019. This included residents who moved to and received care in San Francisco by this date. Residents who moved away prior to the end of 2019 were excluded from the analysis.

Data was pulled from the San Francisco HIV Case Registry and the San Francisco HIV Lab Data Management. Data included whether a patient had been seen at 1 of the 3 clinics that were included in the registry. The patients who were on 1 of the Ward 86 HIV clinic at Zuckerberg San Francisco General Hospital, SFDPH Lab, and Tom Waddell Urban Health clinic lists were included. Race, sex, age, age at HIV diagnosis, being a new diagnosis in 2019, transmission risk, and viral suppression status were also collected.

The TMLE works by reformulating the target parameter as a probability of a patient being observed on any list and dividing it by the target parameter to get the estimated population size. The model also draws from multiple algorithms. The TMLE had a sensitivity to the margin setting, which is a function that prevents looking at the extremes of a parameter space.

There were 2584 people living with HIV included from the 3 lists, with Ward 86 patients. Making up 70% of the analytic sample compared with 28% for SFDPH Lab and 14% for the Tom Waddell clinic. There were 12,507 patients who were in the complete SFDPH surveillance data. Patients of Hispanic origin were more frequently on the SFDPH Lab list and females made up the majority of patients on all 3 lists. There was an underrepresentation of men who have sex with men (MSM) compared with the surveillance data and had an overrepresentation of people who inject drugs (PWID).

Margin settings were set to 0.4 after testing. With this margin and the data from the 2584 people in the 3 lists, the TMLE model was able to estimate that there were 13,523 (95% CI, 12,222-14,824) patients living with HIV in San Francisco, which was the most accurate guess compared with other models, as the true total was 12,507. The log-linear model (6536; 95% CI, 3179-18,010) and the BLC model (6736; 95% CI, 2647-17,957) underestimated the population size, although the intervals included the true number.

The TMLE was able to estimate the number of Black patients (2139 vs 1602) and patients of unknown race who had HIV (1465 vs 1435) accurately compared with the true total respectively, but the number of White patients was underestimated (3536 vs 6569) and the number of Hispanic patients was overestimated (5193 vs 2901). Female patients with HIV were also overestimated (1262 vs 743), even as the male patients were accurately estimated (11,981 vs 11,764). All estimates based on age were accurately estimated or estimated with only minimal bias.

There were some limitations to this study. The true size of the population is unknown which could make it difficult to assess how accurate the estimate is. The sociodemographic data could have been collected with errors in a clinical setting, which could affect the estimation.

The researchers concluded that the TMLE model could be a welcome tool in estimating the target population of patients living with HIV. With surveillance systems being incomplete and variable in quality, public health programming would benefit from accurate estimations of these populations.


Wesson P, Das M, Chen M, et al. Evaluating a targeted minimum loss-based estimator for capture-recapture analysis: an application to HIV surveillance in San Francisco, CA. Am J Epidemiol. Published online November 17, 2023. doi:10.1093/aje/kwad231

Related Videos
William R Short, MD, MPH
Dr Jessica Robinson-Papp
Dr. Jessica Robinson-Papp
Dr. Robinson-Papp
Related Content
CH LogoCenter for Biosimilars Logo