Longitudinal population-level HIV epidemiologic and genomic surveillance highlights growing gender disparity of HIV transmission in Uganda

The Rakai Community Cohort StudyLongitudinal surveillance

Between September 2003 and May 2018, 9 consecutive survey rounds of RCCS, labelled as survey rounds 10 to 18, were conducted in 36 inland communities in south-central Uganda (Fig. 1, Supplementary Tables 1 and 2, and Supplementary Fig. 1). The results presented in this paper derive from data collected through these surveys, including the population census, the RCCS survey participants, the incidence cohort and the phylogenetic transmission cohort.

RCCS survey methods have been reported previously14,18. In brief, for each survey round, the RCCS did a household census, and subsequently invited all individuals who were aged 15-49 years and residents for at least 1 month to participate in the open, longitudinal RCCS survey; and so data collection was not randomized. Data collection was blind relative to previous interactions with individuals or any personal characteristics apart from age and residency status, and any research questions. Eligible individuals first attended group consent procedures, and individual consent was obtained privately by a trained RCCS interviewer. Following consent, participants reported in a private location, typically a tent at the survey hub, on demographics, behaviour, health and health service use. All participants were offered free voluntary counselling and HIV testing as part of the survey. Rapid tests at the time of the survey and confirmatory enzyme immunoassays were performed to determine HIV status. All participants were provided with pre-test and post-test counselling, and referrals of individuals who were HIV-positive for ART. Additionally, all consenting participants, irrespective of HIV status, were offered a venous blood sample for storage/future testing, including viral phylogenetic studies. Supplementary Table 1 summarizes the characteristics of the RCCS participants and HIV-positive participants by age and gender. For the purpose of our analyses, we combined data from three pairs of geographically close areas in peri-urban settings into three communities, and 28 of 36 communities were continuously surveyed over all rounds (Supplementary Table 2). All epidemiologic data collected through RCCS are stored in a database running Microsoft SQL server 2019 and Microsoft Access version 2016.

Population size estimates

To characterize changes in population demography, individual-level data on the census-eligible individuals that were obtained during each census were aggregated by gender, 1-year age band (between 15 and 49 years) and survey round (Extended Data Fig. 1a,b, bars). The age reported by household heads in the census surveys tended to reflect grouping patterns towards multiples of five, suggesting that household heads reported ages only approximately. For this reason, we smoothed population sizes across ages independently for every gender and survey round, using locally weighted running line smoother (LOESS) regression methods that fit multiple polynomial regressions in local neighbourhoods as implemented in the R package stats (version 3.6.2) with the span argument set to 0.5 (Extended Data Fig. 1a,b, line). Model fit was assessed visually without a formal test, suggesting that the data met the assumptions of the statistical model.

Participation rates

To characterize participation rates, we calculated the proportion of RCCS participants in the census-eligible population by gender, 1-year age band and survey round (Extended Data Fig. 1c,d, bars). Following consent, participants reported either their birth date or current age themselves, and accompanying documentary evidence was requested. There were no obvious age grouping patterns of multiples of 5 among participants. Overall, participation rates were lower in men than women (63% versus 75%). Participation rates also increased with age for both men and women, and were very similar across survey rounds. Considering the grouping patterns by age in the population count data, we again smoothed the participation rates across ages independently for every gender and survey round using LOESS regression as specified above for population size estimation (Extended Data Fig. 1c,d, line). Model fit was assessed visually without a formal test, suggesting that the data met the assumptions of the statistical model.

HIV status and prevalence

All RCCS participants were offered free HIV testing. Prior to October 2011, HIV testing was performed through enzyme immunoassays (EIAs) with confirmation via western blot and DNA polymerase chain reaction (PCR). After October 2011, testing was performed through a combination of three rapid tests with confirmation of positives, weakly positives and discordant results by at least two EIAs and western blot or DNA PCR61. Overall, 99.7% participants took up the test offer across survey rounds, and Supplementary Table 1 documents the number of participants with HIV. From these survey data, we estimated HIV prevalence (that is, probability for a participant to have HIV) with a non-parametric Bayesian model over the age of participants independently for both genders and survey round. Specifically, we used a binomial likelihood on the number of participants with HIV parameterized by the number of participants and HIV prevalence in each 1-year age band. The HIV prevalence parameter was modelled on the logit scale by the sum of a baseline term and a zero-mean Gaussian process on the age space. The prior on the baseline was set to a zero-mean normal distribution with a standard deviation of 10. The covariance matrix of the Gaussian Process was defined with a squared exponential kernel, using a zero-mean half-normal prior with a standard deviation of 2 on the scale parameter of the squared exponential kernel and a zero-mean half-normal prior with a standard deviation of 11.3 (= (49 − 15)/3) on the lengthscale of the squared exponential kernel. The model was fitted with Rstan (release 2.21.0) using Stan’s adaptive Hamiltonian Monte Carlo (HMC) sampler62 with 10,000 iterations, including 500 iterations of warm-up. Convergence and mixing were good, with highest R-hat value of 1.0029, and lowest effective sample size of 830. The model represented the data well, with 98.57% of data points inside 95% posterior predictive intervals, indicating that the data met the assumptions of the statistical model. For the mathematical modelling of transmission flows, we assumed that age- and gender-specific HIV prevalence were the same in non-participants in the RCCS communities as in the participants in these communities.

ART use

The RCCS measures ART use through participant reports since survey round 11. Self-reported ART use reflected viral suppression with high specificity and a sensitivity around 70% in the study population (Supplementary Table 9). We took the following pre-processing steps. For survey round 10, we assumed self-reported ART use to have been on the same levels as in survey round 11. Next, the ART use field was adjusted to ‘yes’ for the participants with HIV who did not report ART use but who had a viral load measurement below 1,000 copies per millilitre of blood plasma. Further, we considered it likely that with increasingly comprehensive care and changing treatment guidelines14,63, ART use in individuals with HIV who did not participate increased substantively over time, and this prompted us to consider as proxy of ART use in non-participants the observed ART use in first-time participants with HIV. Overall, first-time participants represented 15.3–39.9% of all participants across survey rounds. Extended Data Fig. 8a,b exemplifies the self-reported ART use data in male participants and male first-time participants. The ART use rate estimates for participants and first-time participants were obtained using the same Bayesian non-parametric model as for HIV prevalence fitted independently on the reported ART use data of participants and first-time participants. Convergence and mixing were good, with highest R-hat value of 1.0025 and lowest effective sample size of 978 for the participants, and 1.0027, 521 respectively for the first-time participants. The model represented the data well, with 99.67% of data points inside the corresponding 95% posterior predictive intervals for the participants, and 99.24% for the first-time participants, indicating that the data met the assumptions of the statistical model. The resulting, estimated ART use rates in infected men and women are shown in Extended Data Fig. 8c.

Viral suppression

Since survey round 15, HIV-1 viral load was measured on stored serum/plasma specimens from infected participants using the Abbott real-time m2000 assay (Abbott Laboratories), which is able to detect a minimum of 40 copies ml–1. Viral suppression was defined as a viral load measurement below 1,000 copies ml–1 plasma blood following recommendations of the WHO34. To estimate virus suppression levels in the infected non-participants, we considered again as proxy data on infected first-time participants. Overall, viral load measurements were obtained from 19.3% of participants with HIV in survey round 15 and nearly all (>97.71%) participants with HIV since survey round 1664,65,66. From these data we estimated the proportion of individuals in the study population with HIV who had suppressed virus in participants and first-time participants (used as proxy for non-participants), using the same Bayesian non-parametric model as for HIV prevalence and ART use. Convergence and mixing were good with the lowest R-hat value of 1.0016 and lowest effective sample size of 461 for the participants and 1.0052, 844 respectively for the first-time participants. The model represented the data well, with 98.19% of data points inside 95% posterior predictive intervals and 97.99% for the first-time participants, indicating that the data met the assumptions of the statistical model. For the purpose of mathematical modelling of transmission flows, we next considered the earlier survey rounds 10 to 14, for which viral load measurements were not available. On average, 93% of individuals reporting ART use also had suppressed virus (Supplementary Table 9), leading us to estimate the number of individuals with suppressed virus before 2011 from corresponding ART use data. Specifically, we estimated the proportion of the study population with HIV that was virally suppressed by adjusting the estimated ART use data with the sensitivity of being virally suppressed given self-reported ART use and the specificity of being virally suppressed given self-reported no ART use estimated from round 15 when available, and otherwise from round 16 (Supplementary Table 9). Specificity and sensitivity values by 1-year age bands were linearly interpolated between the midpoints of the age brackets in Supplementary Table 9. The resulting, estimated virus suppression levels in men and women with HIV are shown in Extended Data Fig. 8d, illustrating that the gap in virus suppression levels increased over time.

Sexual behaviour

RCCS participants reported to interviewers in each round on aspects of sexual behaviour, including the number of sexual partners in the past 12 months within the same community, the number of partners outside the community, and in round 15 the demographic characteristics of up to four partners (Supplementary Table 8). To interpret HIV transmission flows in the context of typical sexual contact networks, we focused on the detailed behaviour data collected in round 15 and estimated sexual contact intensities between men and women by 1-year age band, defined as the expected number of sexual contacts of one individual of gender g and age a with the population of the opposite gender h and age b in the same community. Estimates were obtained with the Bayesian rate consistency model (version 1.0.0), using default prior specifications67. We noted along with previous work68,69,70,71 that women tended to report considerably fewer contacts than men (Supplementary Table 8), prompting us to include in the linear predictor of contact rates additional age-specific random effects to capture under-reporting behaviour in women. Further, community-specific baseline parameters were added to allow for variation in the average level of contact rates in each community, but the age-specific structure of contact rates was assumed to be identical across communities. The resulting model was fitted to all data pertaining to within-community sexual contacts in the last year, including reports of within-community contacts for which information on the partners remained unreported. Contacts reported with partners from outside the same community were excluded, because male-female contacts have to add up to female-male contacts only in the same population denominator, and hence under-reporting could only be adjusted for when within-community contacts are considered. The model was fitted with CmdstanR (version 0.5.1)72 using Stan’s adaptive HMC sampler62 with 4 chains, where each chain runs 2,800 iterations, including 300 warm-up iterations. Convergence and mixing were good, with highest R-hat value of 1.003, and lowest effective sample size of 1,745. The model represented the data well, with >99% of data points inside 95% posterior predictive intervals, indicating that the data met the assumptions of the statistical model. Supplementary Table 8 reports the estimated sexual contact intensities from men and women in survey round 15, and shows that the estimated, under-reporting adjusted sexual contact intensities in women were considerably higher than those directly reported. The table also shows that the estimated number of sexual contacts from men to women equal those from women to men, and the estimated age distribution of sexual contacts is shown in Fig. 2 and Extended Data Fig. 7.

Longitudinal HIV incidence cohortData and outcomes from the incidence cohort

RCCS encompasses both a full census of the study communities and a population-based survey in each surveillance round, which enables identification and follow-up of unique individuals over time, and thus provides a comprehensive sampling frame to measure HIV incidence. The RCCS incidence cohort comprises all RCCS study participants who were HIV-negative at their first visit (baseline) and had at least one subsequent follow-up visit (Supplementary Fig. 1). Individuals in the incidence cohort were considered to be at risk of acquiring HIV after their first visit, and stopped accruing risk at the date of HIV acquisition or the date of last visit. Exposure times were estimated from data collected at survey visit times similarly as in ref. 14. Individuals in the incidence cohort who remained negative until the last survey round contributed their time between the first and last survey visit to their exposure period. Individuals in the incidence cohort who were found to have acquired HIV must have done so between the visit date of the last round in which they were negative and the visit date of the current round, and the infection date was imputed at random between the two dates. This included incident cases who had no missed visit between the last negative and current visit (type 1) or one missed visit (type 2) as in ref. 14, but also cases who had more than one missed visit (type 3). Unknown dates were imputed at random 50 times, and individual exposure periods and incident cases were then attributed to each survey round, summed over the cohort, and then averaged over imputations. Supplementary Table 3 and Extended Data Fig. 2 illustrate the age- and gender-specific exposure times and incidence events in each survey round. In sensitivity analyses, we considered only those individuals in the incidence cohort who resided in one of the 28 inland communities that were continuously surveyed across survey rounds 10 to 18, and found similar incidence dynamics with slightly faster declines in incidence rates in younger men, although this difference was not statistically significant. No statistical methods were used to pre-determine sample sizes but our sample sizes are similar to those reported in previous publications14.

Modelling and analysis

The primary statistical objective was to estimate longitudinal age-specific HIV incidence rates by 1-year age bands across (discrete) survey rounds, separately for each gender. We used a log-link mixed-effects Poisson regression model, with individual-level exposure times specified as offset on the log scale, common baseline fixed effect and further random effects. The random effects comprised a one-dimensional smooth function on the age space, a one-dimensional smooth function on the survey round space, and an interaction term between age and survey round. The functions were specified as one-dimensional Gaussian processes. Alternative specifications, including two-dimensional functions over the participant’s age and survey round, and without interaction terms between age and survey rounds were also tried. We did not consider incidence trends in continuous calendar time because study communities were surveyed in turn, and so the incidence data within each round are structured by communities, which would require further modelling assumptions to account for. Owing to the large number of individual observations, models were fitted using maximum-likelihood estimation (MLE) with the R package mgcv (version 1.8-38)73, to each of the 50 datasets with imputed exposure times for each gender independently. Numerical convergence was examined with the gam.check function. Within- and between-sample uncertainties in parameter estimates, from the variability of the estimation procedure and the data imputation procedure, were incorporated in the age-, gender- and survey-round-specific incidence rate estimates by drawing 1,000 replicate incidence rate estimates from the MLE model mean parameter and associated standard deviation obtained on each of the 50 imputation datasets, and then calculating median estimates and 95% prediction intervals over the 1,000 × 50 Monte Carlo estimates (Fig. 1c). Model fits were evaluated by comparing predicted HIV incidence infections estimates to the empirical data. To assess model fit, incident cases were predicted using the Poisson model parameterized by replicate MLE incidence estimates. Overall, model fit was very good, with 98.80% (98.10–99.49) data points inside the 95% prediction intervals across the 50 imputed datasets and the fitted model was consistent with the available data (Extended Data Fig. 6), indicating that the data met the assumptions of the statistical model. The Akaike information criterion was used to identify the best model for each gender, and the best model was as described above (Supplementary Table 4).

Longitudinal viral phylogenetic transmission cohortData from the transmission cohort

Within RCCS, we also performed population-based HIV deep sequencing spanning a period of more than 6 years, from January 2010 to April 2018. The primary purpose of viral deep sequencing was to reconstruct transmission networks and identify the population-level sources of infections, thus complementing the data collected through the incidence cohort.

The RCCS viral phylogenetic transmission cohort comprises all participants with HIV for whom at least one HIV deep-sequence sample satisfying minimum quality criteria for deep-sequence phylogenetic analysis is available (Supplementary Fig. 1). For survey rounds 14 to 16 (PANGEA-HIV 1), viral sequencing was performed on plasma samples from participants with HIV who had no viral load measurement and self-reported being ART-naive at the time of the survey, or who had a viral load measurement above 1,000 copies per ml of plasma. We used this criterion because viral deep sequencing was not possible within our protocol on samples with virus less than 1,000 copies per ml of plasma, and because self-reported ART use was in this population found to be a proxy of virus suppression with reasonable specificity and sensitivity14,21. Plasma samples were shipped to University College London Hospital for automated RNA sample extraction on QIAsymphony SP workstations with the QIAsymphony DSP Virus/Pathogen Kit (catalogue number 937036, 937055; Qiagen), followed by one-step reverse transcription PCR (RT–PCR)74. Amplification was assessed through gel electrophoresis on a fraction of samples, and samples were shipped to the Wellcome Trust Sanger Institute for HIV deep sequencing on Illumina MiSeq and HiSeq platforms in the DNA pipelines core facility. Primers are publicly available74. For survey rounds 17 and 18 (PANGEA-HIV 2), viral load measurements were available for all infected participants and viral sequencing was performed on plasma samples of individuals who had not yet been sequenced and who had a viral load measurement above 1,000 copies per ml of plasma. Plasma samples were shipped to the Oxford Genomics Centre for automated RNA sample extraction on QIAsymphony SP workstations with the QIAsymphony DSP Virus/Pathogen Kit (937036, 937055; Qiagen), followed by library preparation with the SMARTer Stranded Total RNA-Seq kit v2 - Pico Input Mammalian (Clontech, TaKaRa Bio), size selection on the captured pool to eliminate fragments shorter than 400 nucleotides (nt) with streptavidin-conjugated beads75 to enrich the library with fragments desirable for deep-sequence phylogenetic analysis, PCR amplification of the captured fragments, and purification with Agencourt AMPure XP (Beckman Coulter), as described in the veSEQ-HIV protocol76. Sequencing was performed on the Illumina NovaSeq 6000 platform at the Oxford Genomics Centre, generating 350 to 600 base pair (bp) paired-end reads. Sequencing probes are publicy available77. A subset of samples from survey rounds 14 to 16 with low quality read output under the PANGEA-HIV 1 procedure was re-sequenced with the veSEQ-HIV protocol. To enhance the genetic background used in our analyses, additional samples from the spatially neighbouring MRC/UVRI/LSHTM surveillance cohorts and other RCCS communities were also included. For sequencing, the following software were used, QuantStudio Real-Time PCR System v1.3, Agilent TapeStation Software Analysis 4.1.1, Clarity Version 4.2.23.287, FreezerPro 7.4.0-r14598, and LabArchives Electronic Lab Notebook 2023. We restricted our analysis to samples from 2,172 individuals that satisfied minimum criteria on read length and depth for phylogeny reconstruction and subsequent inferences. Specifically, deep sequencing reads were assembled with the shiver sequence assembly software, version 1.5.778. Next, phyloscanner version 1.8.125 was used to merge paired-end reads, and only merged reads of at least 250 bp in length were retained in order to generate 250 bp deep-sequence alignments as established in earlier work21.

Deep sequencing was performed from 2010 (survey round 14) onwards, but because sequences provide information on past and present transmission events, we also obtained information on transmission in earlier rounds and calculated sequence coverage in participants that were ever deep-sequenced at minimum quality criteria for phylogenetic analysis. Specifically, we required that individuals had a depth of ≥30 reads over at least 3 non-overlapping 250 bp genomic windows. Individuals who did not have sequencing output meeting these criteria were excluded from further analysis, and these were largely individuals sequenced only in PANGEA-HIV 1, and were primarily associated with low viral load samples76,79. In total, we deep-sequenced virus from 1,978 participants with HIV of who 559 were also in the incidence cohort. Supplementary Table 5 characterizes HIV deep-sequencing outcomes in more detail. No statistical methods were used to pre-determine sample sizes but our sample sizes are similar to those reported in previous publications20,27,50.

Reconstruction of transmission networks and source–recipient pairs

The HIV deep-sequencing pipeline provided sequence fragments that capture viral diversity within individuals, which enables phylogenetic inference into the direction of transmission from sequence data alone21,78,80. First, potential transmission networks were identified, and in the second step transmission networks were confirmed and the transmission directions in the networks were characterized as possible. In this study, the first step was modified from previous protocols21 to ease computational burden, while the second step was as before performed with phyloscanner version 1.8.1.

In the first step81, to identify potential transmission networks, HIV consensus sequences were generated as the most common nucleotide in the aligned deep-sequence fragments that were derived for each sample. We then calculated similarity scores between all possible combinations of consensus sequences in consecutive 500 bp genomic windows rather than the entire genome to account for the possibility of recombination events and divergent virus in parts of the genome. Similarity score thresholds to identify putative, genetically close pairs were derived from data of long-term sexual partners enrolled in the RCCS cohort similarly as in refs. 21,81, and then applied to the population-based sample of all possible combinations of successfully sequenced individuals. Overall, 2,525 putative, genetically close individuals were identified, and these formed 305 potential transmission networks.

In the second step, we confirmed the potential transmission networks in phylogenetic deep-sequence analyses. We updated the background sequence alignment used in phyloscanner to a new sequence dataset that included 113 representatives of all HIV subtypes and circulating recombinant forms and 200 near full-genome sequences from Kenya, Uganda and Tanzania, obtained from the Los Alamos National Laboratory HIV Sequence Database (http://www.hiv.lanl.gov/). The deep-sequence alignment options were updated to using MAFFT (version 7.475) with iterative refinement82, and additional iterative re-alignment using consistency scores in case a large proportion of gap-like columns in the first alignment was detected. Deep-sequence phylogeny reconstruction was updated to using IQ-TREE (version 2.0.3) with GTR+F+R6 substitution model, resolving the previously documented deep-sequence phylogenetics branch length artefact20,83. Confirmatory analyses of the potential transmission networks were updated to using phyloscanner (version 1.8.1) with input argument zeroLengthAdjustment set to TRUE. From phyloscanner output, we calculated pairwise linkage scores that summarize how frequently viral phylogenetic subgraphs of two individuals were adjacent and phylogenetically close in the deep-sequence phylogenies corresponding to all 250 bp genomic windows that contained viral variants from both individuals21,25. Similarly we calculated pairwise direction scores that summarize how frequently viral phylogenetic subgraphs of one individual were ancestral to the subgraphs of the other individual in the deep-sequence phylogenies corresponding to all 250 bp genomic windows that contained viral variants from both individuals and in which subgraphs had either ancestral or descendant relationships21,25. Phylogenetically likely source–recipient pairs with linkage scores ≥0.5 and direction scores ≥0.5 were extracted, and only the most likely source–recipient pair with highest linkage score was retained if multiple likely sources were identified for a particular recipient. The resulting source–recipient pairs were checked further against sero-history data from both individuals where available. If sero-history data indicated the opposite direction of transmission, the estimated likely direction of transmission was set to that indicated by sero-history data.

Infection time estimates

The shape and depth of an individual’s subgraph in deep-sequence phylogenies also provide information on the time since infection, and since the sequence sampling date is known thus also on the infection time84 and the age of both individuals at the time of the infection event. We used the phyloTSI random forest estimation routine with default options, which was trained on HIV seroconverter data from the RCCS and other cohorts, and uses as input the output of the phyloscanner software26. Individual-level time since infection estimates were associated with wide uncertainty (Extended Data Fig. 4a), and for this reason we refined estimates for the phylogenetically likely recipient in source–recipient pairs using the inferred transmission direction, age data, and where available longitudinal sero-history data. Specifically, we refined plausible infection ranges as indicated in the schema in Supplementary Fig. 2. Here, the dotted red rectangle illustrates the 2.5% and 97.5% quantiles of the phyloTSI infection time estimates for the phylogenetically likely recipient (x axis) and transmitting partner (y axis). We incorporated evidence on the direction of transmission by requiring that the date of infection of the phylogenetically likely recipient is after that of the transmitting partner (filled red triangle). Sero-history and demographic data were incorporated as follows. For both the recipient and the transmitting partner, the upper bound of the infection date was set as the thirtieth day prior to the first positive test of the participant85. The lower bound of the infection date was set to the largest of the following dates, the date of last negative test if available, the fifteenth birthday, or the date corresponding to 15 years prior the upper bound86. The refined uncertainty range of the infection time estimates of the phylogenetically likely transmitting partner and recipient are illustrated as the purple triangle in the schema above, and obtained as follows. Firstly, we defined individual-level plausible ranges, by intersecting the range of dates consistent with the phyloTSI predictions and sero-history data. If the intersection was empty, we discarded the phyloTSI estimates. Then we intersected the rectangle given by the cartesian product of the plausible intervals for source and recipient with the half-plane consistent with the direction of transmission. Finally, infection dates were sampled at random from the refined uncertainty range, so that the median infection date estimates correspond to the centre of gravity of the triangle (cross). In sensitivity analyses, we further integrated estimates of transmission risk by stage of infection87, though this had limited impact on the estimates (see ‘Sensitivity analyses’ section). In cases where the likely transmitting partner in one heterosexual pair was the recipient partner in another heterosexual pair, the above infection date refinement algorithm was applied recursively so that the refined infection date estimates were consistent across pairs. Finally, the transmission events captured by each source–recipient pair were attributed to the survey round into which the posterior median infection time estimate of the recipient fell, and in cases where the median estimate fell after the start time of a round and the end time of the preceding round, the event was attributed to the preceding round.

In total, we identified 539 source–recipient pairs that involved participants from the 36 survey communities and further individuals from the background dataset. In 13 of the 539 source–recipient pairs, available dates of last negative tests indicated that only the opposite transmission direction was possible and in these cases the inferred direction of transmission was set to the opposite direction. The resulting pairs included 501 unique recipient partners, and for reach we retained the most likely transmitting partner. To identify pairs capturing transmission events within the RCCS inland communities, we restricted analysis initially to 236 heterosexual source–recipient pairs in whom both individuals were ever resident in the 36 survey communities. Of these, 142 pairs were from men to women and 94 from women to men. Infection times were estimated for all sampled individuals and refined for the recipient partners in the 236 heterosexual source–recipient pairs. For 4 recipient partners, the phyloTSI estimates were ignored as they were incompatible with inferred transmission direction and survey data, and was based on sero-history data only. The phylogenetically most likely location of both individuals at time of transmission was estimated as their location at the RCCS visit date that was closest to the posterior median infection time estimate. Using this location estimate, 233 of the 236 heterosexual source–recipient pairs were estimated to capture transmission events in RCCS inland communities and were retained for further analysis. A further six recipient partners had posterior median infection time estimates outside the observation period from September 2003 to May 2018 and were excluded, leaving for analysis 227 heterosexual source–recipient pairs that captured transmission events in RCCS inland communities during the observation period. This excluded 88 potential source–recipient pairs from our study due to ethical considerations and prior analyses suggesting these pairs most likely represent partially sampled transmission chains (that is, ‘false positives’)21.

Transmission flow analysisStatistical framework

We next estimated the sources of the inferred population-level HIV incidence dynamics from the dated, source–recipient pairs in the viral phylogenetic transmission cohort. Overall, inference was done in a Bayesian framework using a semi-parametric Poisson flow model similar ref. 28, that was fitted to observed counts of transmission flows \(_^\) with transmission direction g → h (male-to-female or female-to-male), time period p (survey rounds 10–15 and 16–18) in which the recipient was likely infected, and 1-year age bands i, j of the source and recipient populations respectively, where

$$(g\to h)\in }}}=\}},}}\}.$$

(1b)

The target quantity of the model is the expected number of HIV transmissions in the study population in transmission direction g → h (male-to-female or female-to-male), survey round

Comments (0)

No login
gif