The UN’s Sustainable Development Goals are devoted to eradicate a range of infectious diseases to achieve global well-being. These efforts require monitoring disease transmission at a level that differentiates between pathogen variants at the genetic/molecular level. In fact, the advantages of genetic (molecular) measures like multiplicity of infection (MOI) over traditional metrics, e.g.,
R0, are being increasingly recognized. MOI refers to the presence of multiple pathogen variants within an infection due to multiple infective contacts. Maximum-likelihood (ML) methods have been proposed to derive MOI and pathogen-lineage frequencies from molecular data. However, these methods are biased. Methods and findings
Based on a single molecular marker, we derive a bias-corrected ML estimator for MOI and pathogen-lineage frequencies. We further improve these estimators by heuristical adjustments that compensate shortcomings in the derivation of the bias correction, which implicitly assumes that data lies in the interior of the observational space. The finite sample properties of the different variants of the bias-corrected estimators are investigated by a systematic simulation study. In particular, we investigate the performance of the estimator in terms of bias, variance, and robustness against model violations. The corrections successfully remove bias except for extreme parameters that likely yield uninformative data, which cannot sustain accurate parameter estimation. Heuristic adjustments further improve the bias correction, particularly for small sample sizes. The bias corrections also reduce the estimators’ variances, which coincide with the Cramér-Rao lower bound. The estimators are reasonably robust against model violations.
Applying bias corrections can substantially improve the quality of MOI estimates, particularly in areas of low as well as areas of high transmission—in both cases estimates tend to be biased. The bias-corrected estimators are (almost) unbiased and their variance coincides with the Cramér-Rao lower bound, suggesting that no further improvements are possible unless additional information is provided. Additional information can be obtained by combining data from several molecular markers, or by including information that allows stratifying the data into heterogeneous groups.