The COVID-19 pandemic has demonstrated the shortcomings of epidemiological modelling for guiding policy decisions. Moreover, the modelling efforts resulted in many models yielding different predictions, creating a need to compare these predictions to determine which model is most accurate. We introduce a data synthesis algorithm, CovSyn, designed to generate synthetic COVID-19 datasets providing sufficiently detailed information for benchmarking epidemiological models against a known synthetic ground truth. CovSyn utilizes observed infections with contact tracing, testing, course of disease data, and a contact network based on municipality statistics, which categorises connections into household, school, workplace, healthcare, and municipality. The model's initial parameters and boundaries are derived from empirical data, including the first community outbreak of COVID-19 in Taiwan and clinical observations. Comprehensive parameter space exploration for optimal results is done by the Firefly algorithm. We demonstrate it and validate our estimates by comparing state transition times, daily social contacts, and associated secondary attack rates against a structured dataset and clinical observations. Our simulations align with prior research and this dataset. Most state transition times from 10,000 simulations are within uncertainty ranges. Daily contact numbers and their distribution across layers match empirical findings. Our model accurately reproduced the first COVID-19 outbreak in Taiwan, achieving high accuracy with observed cumulative confirmed cases (R^2 = 0.9) across daily, 7-day moving average, and 31-day moving average levels. Each synthetic subject contains demographic data (age, gender, occupation), course of disease (latent/incubation periods, testing, isolation, critical illness, recovery, and death dates), and contact network data including daily interactions with infected and uninfected individuals. Our algorithm offers a valid alternative for developing and benchmarking epidemiological models to advance COVID-19 forecasting research.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementMinistry of Science and Technology in Taiwan (Grants Number MOST 105-2218-E- 006-016-MY2, 105-2911-I-006-518, 107-2634-F-006-009, 110-2222-E-006-010, 789 and National Science and Technology Council 111-2221-E-006-186, 112-2314-B-006- 079, and 113-2314-B-006-069). This research was supported in part by Higher Education Suprout Project, Ministry of Education to Headquarters of University Advancement at National Cheng Kung University (NCKU)
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityThe CovSyn model, data, and documentation are fully available via GitHub https://github.com/nordlinglab/COVID19-CovSyn
Comments (0)