Machine learning enabled detection of COVID-19 pneumonia using exhaled breath analysis: a proof-of-concept study

The emergence of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) in 2019 has resulted in over 6.7 million deaths worldwide, mostly due to COVID-19 pneumonia and respiratory failure [1]. SARS-CoV-2 is primarily transmitted via the respiratory route [2] and its detection relies on real-time-reverse-transcriptase polymerase chain reaction (RT-PCR) on nasopharyngeal swabs (NPS) [3, 4]. The false-negative rate of this approach may be as high as 33%, particularly late in the course of the disease when viral burden and infection is mainly in the lower airways. In this situation, lower airway sampling is required, via bronchoscopy, which poses significant risk of viral transmission to those performing the procedure [3, 5, 6]. Furthermore, access to bronchoscopy is not equitable and in sick patients may be contraindicated. Therefore, an alternate method for lower airway sampling that is safe and easily accessible can greatly facilitate the rapid and early diagnosis of COVID-19 pneumonia, and allow earlier implementation of appropriate therapy.

Exhaled breath may be an important source of lower airway sampling, as it contains several hundreds of metabolites from the distal airways and the lung parenchyma. Indeed, earlier breath analysis studies utilized gas chromatography-mass spectrometry (GCMS) to identify volatile organic compounds (VOCs) as biomarkers for infectious and neoplastic processes in the lung, [79]. However, GCMS requires expensive equipment, trained personnel and precise calibration. Alternatively, laser absorption spectroscopy (LAS) offers a different approach to disease detection by utilizing a myriad of metabolites from a single exhaled breath sample [10]. Together with data-driven machine learning (ML) classification models, a vast amount of identified metabolites can be processed to provide unique patterns of VOCs as 'breathprints' for the detection of COVID-19 pneumonia. In the present study, an ultra-sensitive LAS-based technique, using cavity ring-down spectroscopy (CRDS), was used to discriminate between breath samples from SARS-CoV-2 positive and negative individuals. The objectives were to: (1) explore the LAS-spectral breathprints identified by ML to discriminate between SARS-CoV-2 positive and negative individuals and (2) the discriminatory property of this approach. The secondary objective was to examine for unique VOCs that can provide insights into the pathobiology of COVID-19 pneumonia.

2.1. Study design

This was a prospective, unblinded, observational proof of concept study. Patients admitted with positive SARS-CoV-2 by RT-PCR on NPS and symptoms of SARS-CoV-2 infection between 11 February–13 August 2021, were recruited from 3 tertiary centers in Ontario, Canada. The study was approved by the research ethics committee of each participating site and registered on clinicaltrial.gov (NCT04867213). All participants provided informed written consent. The inclusion criteria were 18 years or older with sufficient exhaled breath collection for analysis. A similar group of healthy individuals or hospitalized patients without respiratory symptoms and negative SARS-CoV-2 RT-PCR were enrolled as controls. Participants suspected of COVID-19 despite a negative RT-PCR test, those with chronic lung disease or current smokers of tobacco, cannabis or electronic cigarettes within 4 h and/or alcohol intake within 8 h of exhaled breath collection were excluded. The demographic and clinical data of participants were collected either by questionnaire or from medical health records, and were obtained at the time of exhaled breath collection.

2.2. Exhaled breath collection

Exhaled breath was collected using a proprietary exhaled breath sampler (Breathe BioMedical, Moncton, New Brunswick, Canada) which tracks CO2 levels to collect alveolar breath into Tenax TA sorbent tubes. Participants exhaled into the breath sampler while sitting or standing without the use of a nose clip and instructed to breathe deeply and exhale through a single-use bacterial/viral filter (SunMed FH603003) on the sampler's mouthpiece with the procedure repeated until 5-litre (l) samples were amassed (see figure 1(A)). CO2 was recorded via a real-time sensor to confirm the alveolar portion of breath sample was collected. Participants breathing patterns were monitored throughout breath collection. Duplicate sampling of subjects was prohibited. Further information is detailed within the supplemental materials.

Figure 1. Schematic representation of the principle of the breath collection device. (A) An exhaled breath sample was collected into Tenax TA desorption tubes through a single use filter on the sampler's mouthpiece. (B) Breath sample with identifying tag is shipped to central lab for analysis. (C) Cavity ring-down spectroscopy is a highly sensitive laser spectroscopy that measures the absorption of light in a closed optical cavity using highly reflective mirrors placed at each end of the cavity. By recording the decay times of light, the absorption of light is measured to indicate the concentration of compounds. (D) Example infrared spectrum of a sample. (E) Measured spectra were used to develop a supervised machine learning classification model to discriminate SARS-CoV-2 positive from negative samples.

Standard image High-resolution image

Breath samples were collected into Tenax TA desorption tubes and stored at −20 degrees Celsius until analysis with the exception of transport to and from the analysis site. Tenax tubes were chosen rather than sampling bags because of their strong hydrophobic nature and consistent retention of exhaled breath VOCs [11]. Breath samples were analysed within two weeks of sampling. Prior to shipment to collection sites, all sorbent tubes were conditioned and batch tested to ensure low background levels. All sorbent tubes were used within two weeks of conditioning.

2.3. Cavity ring-down spectroscopy (CRDS) measurements and machine learning techniques

Mid-infrared profiles were measured from the 5 l breath samples (desorbed at 300 degrees Celsius) using CRDS. The CRDS instrument was designed and built in-house and is a highly sensitive laser spectroscopy technique that detects trace chemicals by measuring the absorption of light in a closed optical cavity. Each full measurement took approximately 60 min. Three tunable CW lasers, 12C(16O)2, 13C(16O)2 and 12C(18O)2, provided 204 wavelengths between 9.0–11.25 µm (1100–890 cm−1). This wavelength range is in the heart of the fingerprint region, a part of the electromagnetic spectrum where organic compounds show signature infrared absorption. Specifically, isoprene, methanol and ammonia are among common breath VOCs with distinct signatures. The ringdown times at each wavelength were measured with pure nitrogen (τ0) and with sample (τ).

The path length of light interacting with the breath sample can be increased by many folds using highly reflective mirrors placed at each end of the cavity. By recording the decay times of light in the empty versus the breath-filled cavity, the absorption of light by the constituent chemicals within human breath is measured. The absorption spectrum can indicate the concentration of a compound with sensitivities in the parts-per-billion range/level [12].

2.3.1. Machine learning classification model

The measured spectra were used to develop a supervised ML classification model that discriminates SARS-CoV-2 positive from negative samples. First, any missing absorption coefficients were replaced in each spectrum using linear interpolation and then rescaled using vector normalization, using a previously validated approach [1315]. Next, first-order spectral derivative sequences, each comprising of 191 values were extracted from the normalized breathprints and used as features for classification. The features that provided the most useful information were identified using a variant of the minimum redundancy maximum relevance algorithm [16], which ranks features based on their correlation to the class labels (SARS-CoV-2 positive or SARS-CoV-2 negative) and prioritize features that provide unique information. Following this step, the number of features retained for classification was optimized using classification performance. The maximum number of allowed features was fixed at 20 to avoid overfitting and model complexity. A linear support vector machine (SVM) learning approach was used for classification. This uses features from a set of training samples to construct an algorithm which then act as a decision boundary for classifying future samples [1315].

Two validation approaches were used to assess the ML-classifier's performance, a non-nested and nested leave-one-out cross-validation (LOOCV). The non-nested approach utilized all samples for preprocessing and feature selection, while the nested approach, only utilize the training set for preprocessing and feature selection to avoid leaking information from the test set. The standard non-nested LOOCV framework provides a single optimal feature set that is fixed during model training and testing, while the nested approach results in multiple optimal feature sets (one for each training set created during the cross-validation procedure). Both approaches were used, since the non-nested method tend to yield more optimistic estimates and the nested method yields more pessimistic estimates. Therefore the true performance of the ML-classifier lies between the estimates from the two approaches [13, 17].

An iterative process of training and validation across a range of sample sizes was employed to examine how the amount of training data may impact classifier performance. Learning curves were generated by iteratively incrementing the training sample size and re-assessing the classification model, starting with ten random participants, and increasing in increments of ten randomly selected participants. This procedure was repeated ten times for each model, and the performance estimates were averaged to create the learning curves. The class sizes were balanced in each data subset until sample sizes exceeded 106, at which point only COVID-negative subjects remained.

2.4. VOC stepwise fitting analysis

For the secondary objective, a stepwise fitting method was used to fit the measured spectra to a library of compounds [13]. The library comprises of reference absorption data from the Pacific Northwest National Laboratory and the high-resolution transmission molecular absorption (HITRAN) database [18]. There are a total of 502 compounds in the library, of which 133 compounds are present in human breath. Quantification analysis of all 133 VOC compounds was performed for each breath sample and compared between SARS-CoV-2 positive and negative groups by the Mann-Whitney U test. SVM classifier models were developed using the VOC concentrations as features, to assess whether this approach may offer higher performance classifier models than using LAS-spectra as features. For feature selection, the VOCs were filtered based on their significance level from Mann-Whitney U testing (i.e. p < 0.05) and were further optimized using classification accuracy, as with the breathprint model. Only VOCs that appeared in at least 25% of samples were considered for the statistical comparisons and classification models. Isoprene and exogenous VOCs were removed from consideration as features in the SVM model. The SVM model was trained with and without ammonia to assess if the presence of ammonia resulted in differing sensitivity or specificity of the model.

2.5. Statistical analysis

Descriptive analysis was used to summarize the data. Differences in baseline characteristics between groups were assessed using Chi-squared tests (categorical variables), independent t-tests (for normally distributed variables), and the Mann-Whitney U-test (for non-normal continuous variables). The nested and non-nested classification performance estimates were compared for categorical variables using Fisher's exact test, and an independent samples t-test for continuous variables. A p-value <0.05 was considered statistically significant. Statistical analysis was performed using statistical package for social sciences for Windows Version 27 (IBM, Armonk, NY, USA).

There were 135 patients enrolled and 115 patients provided sufficient exhaled breath samples for inclusion into the final analysis (53 SARS-CoV-2 positive and 62 controls). The number of participants excluded and the reasons for exclusion were similar between the SARS-CoV-2 positive and negative groups suggesting that COVID-19 pneumonia did not adversely impact the feasibility of exhaled breath collection (figure 2).

Figure 2. Study flow diagram.

Standard image High-resolution image

Of the included participants, SARS-CoV-2 patients had higher mean body mass index (BMI); higher prevalence of coronary artery disease and insulin-dependent diabetes mellitus; and received higher fraction of inspired oxygen (Fi02) therapy (table 1). Many SARS-CoV-2 patients (67%) at the time of breath sample collection had radiographic bilateral lung infiltrates and 62% required supplemental oxygen therapy. Among the SARS-CoV-2 patients, 14 had breath sample collection within 7 d of symptom onset (early) and 37 had breath collection after 7 d (late), while in two patients, the time of collection to symptom onset was unknown.

Table 1. Subject demographics.

VariableSARS-CoV-2 Positive (n = 53)SARS-CoV-2 Negative (n = 62) p-valueMale (%)36 (67.9)32 (51.6)0.08Mean age in years (S.D.)57.7 ± 17.157.5 ± 12.80.95Mean BMI (S.D.)29.9 ± 7.327.0 ± 6.00.02Smoking status (%):   Current1 (1.9)3 (4.8) Ex-smoker11 (20.8)13 (21.0) Never smoker39 (73.6)42 (67.7)0.60Covid variant (%):   Original18 (34.0)  Alpha30 (56.6)  Other mutation3 (5.7)  Unknown2 (3.8)——Radiological evidence of pneumonia (%):   Unilateral7 (13.2)  Bilateral35 (66.0)  None9 (17.0)  Unknown2 (3.8)——Comorbidities (%):   Hypertension19 (35.8)15 (24.2)0.17Dyslipidemia15 (28.3)15 (24.2)0.62Coronary artery disease9 (17.0)1 (1.6)<0.01Asthma4 (7.5)4 (6.5)0.82COPD3 (5.7)2 (3.2)0.52Chronic kidney disease2 (3.8)1 (1.6)0.47IDDM9 (17.0)3 (4.8)0.03NIDDM12 (22.6)11 (17.7)0.51Medication (%):   Dexamethasone36 (67.9)  Remdesivir4 (7.5)  Tocilizumab6 (11.3)——Requiring supplemental oxygen therapy33 (62.3)0<0.01Median-inspired FiO2 (IQR):28 (21–32)21.0<0.01

Definition of abbreviations: BMI = body mass index; COPD = chronic obstructive pulmonary disease; Fi02: fraction of inspired oxygen; IDDM = insulin-dependent diabetes mellitus; IQR = interquartile range; NIDDM = non-insulin dependent diabetes mellitus; S.D. = standard deviation. Parametric variables were compared using Chi-squared test for categorical variables and independent samples t-test for normally distributed variables, Nonparametric variables were compared using the Mann-Whitney U-test.

3.1. Identification of SARS-CoV-2 in breath samples using machine learning classifier algorithm

The median LAS-spectra for the two groups are shown in figure 3. As shown, the median LAS-spectra for both groups were similar at higher wavelengths but differed significantly at the lower wavelength spectrum. The ML-classifier model derived from these breathprints achieved a non-nested LOOCV accuracy of 81.7% (77.4% sensitivity, 85.5% specificity) with 7 first derivative features. The corresponding nested LOOCV accuracy for the model was 72.2% (67.9% sensitivity, 75.8% specificity) with an average of 12.4 features selected across training sets. The receiver operating characteristic curves for the SVM scores obtained with the non-nested and nested LOOCV frameworks are depicted in figure 4. The area under the curve was 0.851 for the non-nested LOOCV approach, and 0.727 for the nested LOOCV method. Additionally, we obtained learning curves for both the nested and non-nested LOOCV scenarios for incremental increase in the sample size of data for training (figure 5). This showed that the incremental increase in accuracy for the ML-classifier model began to level off with sample size greater than 50 breath samples. Furthermore, the performance of the ML-classifier was robust and consistent across stratified subgroups using pre-specified participants' characteristics (tables 2 and 3). There was no significant association between the level of misclassification with participant characteristics such as sex, smoking status, the onset of COVID-19, SARS-CoV-2 variant type, time from symptom onset and breath sampling, BMI, age, Fi02 requirements and the presence of chronic kidney disease or diabetes mellitus.

Figure 3. Median CRDS spectra for SARS-CoV-2 positive (n = 53) and negative (n = 62) patients.

Standard image High-resolution image

Figure 4. ROC curves for (a) the SVM scores obtained with LOOCV, and (b) the SVM scores obtained with nested LOOCV. The operating points representing an SVM score threshold of 0 are indicated.

Standard image High-resolution image

Figure 5. Learning curves for the non-nested and nested CRDS breathprint models.

Standard image High-resolution image

Table 2. Non-nested LOOCV classification performance by variable.

 SARS-CoV-2 Positive (n = 53)SARS-CoV-2 Negative (n = 62)VariableTPFN p-valueTNFP p-valueSex (%):      Female12 (70.6%)5 (29.4%) 28 (93.3%)2 (6.7%) Male29 (80.6%)7 (19.4%)0.4925 (78.1%)7 (21.9%)0.15Age (S.D.):59.4 ± 17.351.8 ± 15.90.2057.5 ± 11.857.8 ± 19.90.95BMI (S.D.):30.3 ± 7.728.8 ± 5.70.5427.3 ± 5.725.2 ± 7.70.34Smoking status (%):      Current0 (0%)1 (100%) 3 (100%)0 (0%) Ex-smoker8 (72.7%)3 (27.3%) 14 (93.3%)1 (6.7%) Never smoker31 (79.5%)8 (20.5%)0.2434 (81.0%)8 (19.0%)0.65Chronic kidney disease (%)1 (50.0%)1 (50%)0.40 (0%)1 (100%)0.15Insulin or non-insulin dependent diabetes mellitus (%)15 (71.4%)6 (28.6%)0.512 (85.7%)2 (14.3%)0.99Covid variant (%):      Original17 (94.4%)1 (5.6%)    Alpha22 (73.3%)8 (26.7%)    Other mutation2 (66.7%)1 (33.3%)0.15———Onset (%):      ⩽7 d12 (85.7%)2 (14.2%)  — >7 d28 (75.7%)9 (24.3%)0.70— —Requiring supplemental oxygen therapy (%):      Yes23 (69.7%)10 (30.3%)    No18 (90.0%)2 (10.0%)0.10———FiO227.9 ± 8.331.6 ± 10.60.21———

Definition of abbreviations: BMI = body mass index; Fi02 = fraction of inspired oxygen; FN = false negatives; FP = false positives; LOOCV = leave-one-out cross-validation; TN = true negatives, TP = true positives. Data are presented as mean ± standard deviation for continuous variables and % for categorical variables. Fisher's exact test was used to assess associations for categorical variables (sex, smoking, onset, variant, oxygen therapy) and a two-sample t-test was used for continuous variables (age, BMI).

Table 3. Nested LOOCV classification performance by variable.

 SARS-CoV-2 Positive (n = 53)SARS-CoV-2 Negative (n = 62)VariableTPFN p-valueTNFP p-valueSex (%):      Female10 (58.8%)7 (41.2%) 24 (80%)6 (20%) Male26 (72.2%)10 (27.8%)0.3623 (71.9%)9 (28.1%)0.56Age (S.D.):57.5 ± 18.558.2 ± 13.70.9056.7 ± 12.060.1 ± 15.20.43BMI (S.D.):30.0 ± 7.429.8 ± 7.20.9527.7 ± 5.524.7 ± 7.00.09Smoking status (%):      Current1 (100%)0 (0%) 2 (66.7%)1 (33.3%) Ex-smoker7 (63.6%)4 (36.4%) 12 (80%)3 (20%) Never smoker26 (66.7%)13 (33.3%)0.9931 (73.8%)11 (26.2%)0.89Chronic kidney disease (%)1 (50.0%1 (50.0%)0.540 (0%)1 (100%)0.24Insulin or non-insulin dependent diabetes mellitus (%)12 (57.1%)9 (42.9%)0.2312 (57.1%)2 (14.3%)0.48Covid variant (%):      Original20 (66.7%)10 (33.3%)    Alpha14 (77.8%)4 (22.2%)    E484K mutation2 (66.7%)1 (33.3%)0.78———Onset (%):      ⩽7 d11 (78.6%)3 (21.4%)    >7 d25 (67.6%)12 (32.4%)0.51———Requiring supplemental oxygen therapy (%):   —— Yes20 (60.6%)13 (39.4%)    No16 (80.0%)4 (20.0%)0.23   FiO228.1 ± 8.630.1 ± 9.50.47———

Definition of abbreviations: BMI: body mass index; Fi02: fraction of inspired oxygen; FN = false negatives; FP = false positives; LOOCV = leave-one-out cross-validation; TN = true negatives, TP = true positives. Data are presented as mean ± standard deviation for continuous variables and % for categorical variables. Fisher's exact test was used to assess associations for categorical variables (sex, smoking, onset, variant, oxygen therapy) and a two-sample t-test was used for continuous variables (age, BMI).

3.2. VOC stepwise fitting analysis

Four compounds (2-Methyl-1-propanal; Ammonia; Phenol; and Ethene) were found to be significantly different between the SARS-CoV-2 positive and negative groups (figure 6). The SVM-classifier model derived using VOC features including ammonia achieved a non-nested LOOCV accuracy of 74.7% (71.6% sensitivity, 77.4% specificity) utilizing three compounds and a nested LOOCV accuracy of 63.4% (83.0% sensitivity, 46.7% specificity) utilizing an average of 3.5 compounds across LOOCV training sets. The SVM model classifier model derived using VOC features excluding ammonia achieved a non-nested LOOCV accuracy of 61.7% (62.2% sensitivity, 61.2% specificity) utilizing three compounds and a nested LOOCV accuracy of 60.0% (83.0% sensitivity, 40.3% specificity) utilizing an average of 3.0 compounds across LOOCV training sets.

Figure 6. Dot plot of the four highest ranked VOCs in each patient for both SARS-Cov-2 positive and negative groups.

Standard image High-resolution image

This study demonstrates that a machine learning-based breathprint model using CRDS measurements may potentially provide a valuable non-invasive option for detecting SARS-CoV-2 in exhaled breath samples. Current guidelines recommend repeating RT-PCR tests for SARS-CoV-2 in cases of high clinical suspicion or worsening symptoms, and lower airway sampling may assist further in the diagnosis. However, lower airway sampling via bronchoscopy is invasive and aerosol-generating, resulting in an elevated risk of viral transmission due to environmental contamination. The use of exhaled breath to detect SARS-CoV-2 from the lower airways holds great promise as a simple, non-invasive, and accessible technology that can be easily deployed widely in any settting with minimal training. It has the potential to achieve a broad reach into the community within the healthcare environment that will facilitate the rapid and early diagnosis of SARS-CoV-2 infections particularly in the lower airways. Furthermore, the same technology and approach may be employed to develop unique breathprints for detecting other lower respiratory pathogens in future pandemic preparedness and response.

Our ML-breathprint classifier achieved a non-nested and nested accuracy of 81.7% (77.4% sensitivity, 85.5% specificity) and 72.2% (67.9% sensitivity, 75.8% specificity) respectively. As the nested model is known to be inherently pessimistic due to the reduced information available during feature selection, the true generalizable accuracy lies between the nested and non-nested results [19]. The WHO had previously recommended that SARS-CoV-2 tests that met the minimum performance requirement of ⩾80% sensitivity compared to the gold standard RT-PCR tests could be used to diagnose SARS-CoV-2 in suspected cases [

Comments (0)

No login
gif