Since 2015, deep learning is increasingly being used in clinical natural language processing (NLP) []. Large language models (LLMs) based on deep learning technology are widely used in numerous clinical NLP domains []. Because contextual comprehension is critical for the overall performances of NLP models, studies have focused on the development of models that excel in conveying contextual information. Conventional approaches of NLP involve crafting word-to-word sequence models such as the hidden Markov model and using limited datasets annotated with labels such as disease and medication names [-]. However, studies are increasingly focusing on fine-tuning LLMs that have been pretrained on massive unlabeled biomedical literature sources, such as Medical Information Mart for Intensive Care (MIMIC-III) [] and PubMed [,]. This shift in the NLP research direction has substantially elevated the contextual understanding capabilities of models and inspired studies on clinical NLP that focus on LLM utilization. For example, studies on automated summarization [-] have effectively extracted critical phrases from diverse sources, including biomedical papers and patient records. In addition, studies on entity extraction [-] have identified major entities such as disease names and drug names. However, these studies have focused exclusively on English-language corpora.
In the multilingual clinical domain, we proposed a set of contextual understanding conditions, with a comprehensive suite of clinical NLP evaluations specifically for these conditions. The proposed approach involves comparatively assessing bidirectional encoder representations from transformers (BERT) models [] to provide guidelines for selecting the most suitable BERT model for a particular condition.
We proposed 2 hypotheses to examine 4 BERT models. First, we assumed that within the multilingual clinical domain, a language model capable of comprehending multiple languages would achieve superior performance. Second, models with the capacity to comprehend medical contexts would demonstrate superior efficacies. We selected BERT-base [], Korean BERT (KoBERT) [], Multilingual BERT (M-BERT) [], and BERT for Biomedical Text Mining (BioBERT) [] for the study. We pretrained these models on visit records on 160,000 patients. Subsequently, we introduced a series of comprehensive downstream tasks to learn the conditions required for these models to achieve effective contextual understanding. We assumed that an effective language model thrives in contextual comprehension under the following conditions:
The model can determine whether the provided documents pertain to the same patient (tasks 1 and 2).The model is proficient in identifying the department associated with a given document (task 3).The model can discern the descriptions within medical records for the conditions of different patients (tasks 4 and 5).The model can ascertain the connection among sentences (task 6).The model can competently deduce disease names based on existing knowledge (task 7).The rationale of the proposed conditions is the widespread adoption of BERT models in the medical domain for various applications.
BERT has been applied in medical natural language inference research to assess the relationship between 2 sequences (premise and hypothesis) with entailment, contradiction, or neutrality labels. Percha et al [] used a fine-tuned BERT to locate clinical notes relevant to query sentences. Romanov and Shivade [] created the MedNLI clinical dataset for natural language inference. They used several models and methodologies, such as bag-of-words, InferSent, and enhanced sequential inference models, to confirm the efficacy and validity of their datasets. Boukkour et al [] introduced an alternative approach to BERT tokenization, proposing a convolutional neural network–based character-based tokenizer as a replacement for WordPiece Tokenizer, which is used to pretrain BERT, to improve BERT performance on the MedNLI dataset. Kanakarajan et al [] pretrained the ELECTRA model, which is named “efficiently learning an encoder that classifies token replacements accurately,” [] using abstracts from PubMed, and evaluated its performance on the MedNLI dataset.
BERT was applied to categorize clinical notes. Rasmy et al [] introduced Med-BERT, which pretrained BERT using electronic health record data to classify diabetes and pancreatic cancer datasets. This model exceeded gated recurrent units by 2‐4 in terms of area under the receiver operating characteristic score. Zhang and Jankowski [] proposed average pooling transformer layers handling token-, sentence-, and document-level embeddings for classifying International Classification of Diseases codes. Their model outperformed the BERT-base model by 11 points.
For the reading comprehension task, BERT can be used to determine the answer span within a given text. Pampari et al [] proposed the electronic medical record question answering (emrQA) dataset to determine the answer span to a question in a clinical context. Yue et al [] compared the performances of BERT-base, BioBERT, and ClinicalBERT [] on the emrQA dataset and additional test datasets to address the problems of the emrQA dataset. Rawat et al [] used 30 logical forms to express questions in semistructured texts and identified the correct responses in the emrQA dataset. They entered clinical notes and questions and used multitask training to simultaneously predict the logical structure of the question and the text span of the answer in a clinical note. Savery et al [] introduced the MEDIQA-AnS dataset, which contains questions and corresponding answers regarding the health care concerns of patients. The correct answers to these questions, which contain valuable information about the patients, are used as summaries.
BERT can be used to extract information from clinical notes. Yang et al [] used the 2010 i2b2 [], 2012 i2b2 [], and 2018 national NLP clinical challenges (n2c2) [] datasets to compare the information extraction performances of BERT models, namely, BERT-base, ELECTRA, A Lite BERT (ALBERT) [], and Robustly Optimized BERT Pretraining Approach (RoBERTa) []. The test results revealed that RoBERTa outperformed the other models. Richie et al [] used Clinical BERT [] to extract the social determinants of patient health, namely, employment, living tobacco, alcohol, drug use, and their attributes, from the n2c2 2022 Track 2 dataset []; for instance, texts such as “works” and “unemployed” were extracted for detailing employment information.
Although studies have extensively examined BERT versatility, they have focused only on English corpora. To address this limitation, we comprehensively analyzed the efficacies of BERT models in various tasks involving medical documents in both Korean and English.
The rest of the manuscript is organized as follows. The Methods section outlines the diverse tests used for BERT analysis and their application procedures. The Results section presents a summary of the outcomes of each test. The Discussion section outlines the distinctive characteristics of each BERT model and presents a thorough analysis for understanding the reasons for these characteristics. Finally, the Conclusion section summarizes the study and emphasizes its significance.
The aim of this study was to identify the BERT models that perform optimally in the bilingual (Korean and English) clinical domain. To achieve this objective, we designed 7 tasks, evaluated the performance of 4 BERT variants (BERT-base, BioBERT, KoBERT, and M-BERT) across these tasks, and assessed their relative significance.
We obtained outpatient records from 8 departments, namely, endocrinology, respiratory, cardiovascular, gastroenterology, rheumatology, nephrology, allergy medicine, and infectious medicine departments, at Seoul National University Hospital in South Korea. We collected the records of 164,460 outpatients between 2010 and 2019. The dataset comprised 2,453,934 documents, with 412,499,140 tokens generated after tokenization using white space. The distribution of tokens and documents for various departments was as follows: endocrinology (tokens: 91,352,271; docs: 496,938), respiratory (tokens: 31,556,578; docs: 195,048), cardiovascular (tokens: 114,978,554; docs: 696,061), gastroenterology (tokens: 57,755,571; docs: 416,062), rheumatology (tokens: 24,857,675; docs: 204,600), nephrology (tokens: 70,865,514; docs: 322,629), allergy medicine (tokens: 17,024,481; docs: 92,041), and infectious medicine departments (tokens: 4,108,496; docs: 30,555). provides statistical data for the corpus. presents the clinical note of a patient experiencing rheumatoid arthritis.
Table 1. Statistical data of clinical notes in Seoul National University Hospital between 2010 and 2019.DepartmentTokens, nDocuments, nEndocrinology91,352,271496,938Respiratory31,556,578195,048Cardiovascular114,978,554696,061Gastroenterology57,755,571416,062Rheumatology24,857,675204,600Nephrology70,865,514322,629Allergy medicine17,024,48192,041Infectious medicine4,108,49630,555Sum412,499,1402,453,934Table 2. The example of a clinical note that was used for training bidirectional encoder representations from transformers models (for better understanding, an English translation has been added).SectionContentsHistoryKorean: 3117.2.1 arthralgia r/o d/t letrozole 로 병원 방문; English (translated): 3117.2.1 arthralgia, rule out (r/o) due to letrozole. Visited hospitalaRF: rheumatoid factor.
bACCP: anticitrullinated protein antibody.
cANA: antinuclear antibody.
dP/E & Lab: physical examination and laboratory.
eCBC: complete blood count.
fWNL: within normal limits.
gCRP: C-reactive protein.
Ethical ConsiderationsWe obtained approval to use the original data collection for research purposes from the institutional review board (IRB) at Seoul National University Hospital (IRB no. C-2108-008-1242). According to the institution’s IRB policy, the data cannot be publicly disclosed due to patient privacy concerns. Instead, we provide an overview of the data in .
BERT ModelsThe BERT-base model is a precursor in pretrained transformer encoders []. Vast open-domain data sources, including Wikipedia and BooksCorpus, are used to train the model []. The model is primarily focused on English text. The configuration of this dataset facilitates the expression of contextual representations of English sequences.
The BioBERT model is an evolution of BERT and is pretrained on PubMed data and enriched with biomedical entities, rendering BioBERT proficient in comprehending terminologies such as disease and drug names. In this study, we used the latest iteration of BioBERT, that is, BioBERT version 1.1.
The SKT Corporation in South Korea devised the KoBERT model to enhance the comprehension and processing of the Korean language. Data from Korean Wikipedia and news articles were used to pretrain the model.
The M-BERT model was obtained from a richly varied corpus of 104 languages, enabling a contextual representation that spans both English and Korean sequences.
PretrainingTo enhance the bilingual clinical contextual understanding capabilities of BERT models, we conducted additional pretraining using an extensive dataset comprising 159,460 out of 164,460 outpatient records from Seoul National University Hospital, employing masked language modeling. The data were preprocessed meticulously using this strategy. WordPiece Tokenizer was used by BERT-base, BioBERT, and M-BERT; SentencePiece Tokenizer [] was used by KoBERT. All tokenizers were case-sensitive. Subsequently, random tokens within the input sequence were replaced with [MASK] tokens. This process was reiterated 10 times to yield the data required for pretraining. The pretraining task of the model involved reinstating the [MASK] token to its original token, drawing on the data crafted through this preprocessing procedure.
Multifaceted Clinical NLP TasksThe evaluation framework encompassed 5 characteristics. Each characteristic was examined through 7 distinct downstream tasks that were designed to assess the clinical contextual comprehension capabilities of various BERT models.
Homogeneity DeterminationAs seen in , we used 2 single outpatient records per input sequence to determine document homogeneity. Each model performed binary classification, discerning whether the records corresponded to those of the same patient (task 1). We extended this examination to the section level, tasking each model with predicting homogeneity based on a smaller segment of a page (task 2). In task 2, the objective was to determine whether 2 sequences originated from the same patient record, with 1 sequence containing an assessment section and the other section containing a randomized section.
Figure 1. Document homogeneousness test (tasks 1 and 2). BERT: bidirectional encoder representations from transformers; CLS: classification. Document RepresentativenessAs seen in , to assess document representativeness, we devised a task that focused on department identification by using individual visit records (task 3).
Figure 2. Document representativeness test: classifying documents (task 3). BERT: bidirectional encoder representations from transformers; CLS: classification; SEP: separator. Reading Comprehension TestThe reading comprehension test () test extracted summarized content from a visit record. We focused on extracting the assessment section from the Subjective, Objective, Assessment, Plan (SOAP) or the history, physical examination, laboratory, assessment, and plan sections. The experiments encompassed 2 setups, namely, 1 setup with section-shuffled documents (task 4) and 1 setup with maintained section-order documents (task 5).
Figure 3. Reading comprehension test: identifying the department associated with a given document (with section shuffling: task 4; w/o section shuffling: task 5). BERT: bidirectional encoder representations from transformers; CLS: classification; SEP: separator; w/o: without. Contextual ConnectionsAs seen in , we introduced a task that required the model to differentiate the most recent visit record from a set of 4 candidate documents when given a query document representing the oldest visit record (task 6). The limitation of BERT models regarding the amount of the input length they can handle necessitates a workaround because simultaneously inputting both the query document and 4 candidate documents is not feasible. To address this problem, we adopted a 2-step approach. First, each individual document was independently inputted into BERT to acquire document embeddings. Subsequently, these document embeddings, forming a pair comprising the query document and the kth candidate document embeddings, were introduced into a feedforward neural network (FFNN) []. For example, if the query and document embedding pair for the most recent visit were inputted into the FFNN, the model was trained to output a prediction value of 1; this value was assigned based on our assumptions. We postulated that the query document, which corresponded to the earliest visit among the 5 documents, and the last document, which denoted the most recent visit, encompassed the most distinct narrative. Consequently, we measured the cosine distances between these 2 embeddings and directed the model to output a prediction value of 1, which indicated the greatest distance in terms of cosine similarity. By contrast, if the query and nonanswer document embedding pairs were presented to the FFNN, the model was trained to output a prediction value of zero.
Figure 4. Document connectivity test: finding the last visited document (task 6). BERT: bidirectional encoder representations from transformers. Knowledge ReasoningThe knowledge reasoning characteristic () evaluated the capacity of a model to deduce entities from masked text (task 7). Each model was tasked with deducing disease names from masked visit records in which the disease names had been replaced with [MASK] tokens. We used MetaMap [] to create a dataset by identifying diagnostic names. Each model, when presented with the [MASK] token and context, selected the correct disease name from 63 disease names. A comprehensive list of the entities is shown in Table S1 in .
Figure 5. Knowledge reasoning test: finding the disease name (task 7). BERT: bidirectional encoder representations from transformers; CLS: classification; SEP: separator. Experimental SettingsWe trained and evaluated 4 types of publicly available BERT models through the following process. We used records of 159,460 patients out of 164,460 patients for pretraining. In the pretraining procedure, 15% of random tokens from the 159,460 patient records were masked. Among them, 80% of the masked tokens were replaced with [MASK] tokens, 10% were replaced with random tokens, and the remaining 10% retained their original tokens. We trained the BERT models to restore [MASK] tokens to their original tokens.
After pretraining, the 4 BERT models were fine-tuned for tasks 1‐7. For fine-tuning, we used 5000 patient records that were not used in pretraining. We assigned 4000 patients to the training set and 1000 patients to the test set and then created training and evaluation data specific for each task. In each task, the 4 pretrained BERT models were trained using the training set and evaluated on the test set.
In the pretraining step, 4 NVIDIA 3090 graphics processing units (GPUs) were used in parallel for 3 epochs. After pretraining, all the models were fine-tuned using a 1080ti GPU except for task 6, in which 3090 GPU were used, because this task required more calculation procedures and memory. The detailed hyperparameter settings are described in Table S2 in . The detailed experimental settings and analysis code used in this study are available on GitHub [].
In tasks 1‐3, BERT-base and BioBERT exhibited the best scores; and present the corresponding results.
Table 3. Results of various BERT models in tasks 1 and 2.ModelTask 1: Determination of whether 2 documents are from the same patientsTask 2: Determination of whether 2 sections are from the same patientsPrecisionRecallF1-scorePrecisionRecallF1-scoreBERT-base84.4494.1989.0589.2887.6088.43BioBERT83.3696.2189.3292.9282.7387.53KoBERT83.9574.0578.6990.6875.7882.56M-BERT83.2294.0288.2983.5693.3888.19aBERT: bidirectional encoder representations from transformers.
bBioBERT: BERT for Biomedical Text Mining.
cKoBERT: Korean BERT.
dM-BERT: Multilingual BERT
Table 4. Results of various BERT models in task 3.ModelTask 3: Identification of the department associated with a given document accuracyBERT-base96.75BioBERT97.44KoBERT95.38M-BERT96.06aBERT: bidirectional encoder representations from transformers.
bBioBERT: BERT for Biomedical Text Mining.
cKoBERT: Korean BERT.
dM-BERT: Multilingual BERT.
In the homogeneity test conducted on document-level inputs (task 1), BioBERT achieved the highest F1-score, whereas in the test conducted on the section-level inputs (task 2), BERT-base achieved the highest F1-score. Comparing the scores under tasks 1 and 2 revealed that BioBERT exhibited a more substantial drop in performance than those of other models. By contrast, KoBERT consistently demonstrated a diminished performance compared with that exhibited by other BERT models. In the document representativeness test, which entailed the selection of a single department from a set of 8 department candidates, BioBERT exhibited superior performance in terms of accuracy, which was the evaluation metric.
Results of Tasks 4-7In tasks 4‐7, M-BERT achieved the best scores ( and ).
Table 5. Results of various BERT models in tasks 4 and 5.ModelTask 4: Finding the assessment section with inputs that are section-shuffledTask 5: Finding the assessment section with inputs that are not section-shuffledPrecisionRecallF1-scorePrecisionRecallF1-scoreBERT-base71.0361.7460.8374.5959.1456.69BioBERT72.1656.3151.6474.7155.9951.17KoBERT76.5777.4176.8892.6193.8893.15M-BERT93.1594.6193.7796.5296.3796.44aBERT: bidirectional encoder representations from transformers.
bBioBERT: BERT for Biomedical Text Mining.
cKoBERT: Korean BERT.
dM-BERT: Multilingual BERT.
Table 6. Results of various BERT models in task 7.ModelTask 7: Determination of disease names based on existing knowledgehit@1hit@3hit@10BERT-base60.2677.4793.40BioBERT59.4080.2095.12KoBERT46.2072.0291.54M-BERT61.1281.6495.41aBERT: bidirectional encoder representations from transformers.
bBioBERT: BERT for Biomedical Text Mining.
cKoBERT: Korean BERT.
dM-BERT: Multilingual BERT.
In the reading comprehension tests (tasks 4 and 5), the performances of the models were evaluated in terms of the F1-score, which was calculated by measuring the proportion of tokens within the predicted interval that correctly overlapped with the actual interval. M-BERT achieved the highest performance in reading comprehension tests. In addition, the models exhibited the largest performance differences in these tests. In the context connectivity test (task 6), M-BERT exhibited the highest performance with an F1-score of 64.75, whereas all the other models achieved a score lower than 60 (BERT-base: 59.78; BioBERT: 58.39; KoBERT: 25.62; and M-BERT: 64.75). In the knowledge-reasoning test (task 7), the M-BERT model exhibited the best performance. The primary objective of this test was to accurately prognosticate 63 potential candidate diagnoses, as extracted from clinical documents, in which the diagnosis name was substituted with [MASK]. In our assessment, we used hit@k (where k=1, 3, or 10). For instance, in task 7, BERT computes probabilities for 63 diseases based on a provided context. In this context, hit@k is a true positive if k diseases with the highest probability encompass the correct disease. The final evaluation score is then determined by dividing the number of true positives by the total number of sequences under assessment.
In tasks 1‐3, the BERT classification ([CLS]) embedding was the input for the FFNN. The [CLS] token, positioned at the far-left side of the input sequence, is a classification token. The embedding of this token is commonly used as a feature for classification tasks, indicating the model’s comprehension of segment-level or document-level context. In tasks 1 and 2, homogeneity was assessed at the document and section levels, respectively, and BioBERT and BERT-base demonstrated the highest performances, respectively. In task 3, BioBERT achieved the highest score. Based on these observations, we inferred that BERT-base and BioBERT would be suitable for tasks involving [CLS] embedding.
Generally, a model’s ability to understand context diminishes as the number of tokens absent from its dictionary increases. Unknown ([UNK]) tokens represent tokens absent from the model’s dictionary, and the presence of these tokens correlates with lower model performance. The higher the frequency of [UNK] tokens, the greater the challenge for the model to accurately comprehend the context. Notably, despite the limited inclusion of Korean tokens, these models excelled in tasks 1‐3 (Table S4 in ). BERT-base and BioBERT, which were pretrained on English sentence patterns, exhibited improved performances because of the prevalence of English sentences in outpatient visit records, which typically detailed their diseases.
Influence of Multilingual Capabilities in Reading Comprehension Tasks on Outcomes (Tasks 4 and 5)In tasks 4 and 5, the reading comprehension ability of the model was assessed by determining the scope of the assessment section. Among models, M-BERT demonstrated the highest performance, whereas BERT-base and BioBERT exhibited the lowest test scores. The presence of extensive multilingual capabilities in the reading comprehension tests was the predominant factor influencing these outcomes.
To comprehend why BERT-base and BioBERT exhibit markedly inferior performance compared with M-BERT in tasks 4 and 5, understanding the composition of the BERT model dictionaries and the function of the [UNK] token is crucial. In BERT models, a dedicated tokenizer is used to segment text into tokens. These tokens are retained if present in the model’s dictionary; otherwise, the tokens are substituted with [UNK] tokens, representing unknown entities. Consequently, a higher prevalence of [UNK] tokens indicates a diminished ability of the model to comprehend the semantic nuances of the sequence. In tasks 4 and 5, where each token’s semantic relevance determines its association with an assessment section, models with inadequate knowledge of individual tokens exhibit poor performance. The dictionaries of BERT-base and BioBERT contain minimal Korean characters, resulting in the majority of Korean tokens being replaced with [UNK] tokens. By contrast, M-BERT encompasses a comprehensive range of Korean characters in its dictionary. Therefore, BERT-base and BioBERT exhibit notably inferior performance in tasks 4 and 5 compared with M-BERT.
Relationship Between Multilingual Capability and Task Complexity (Task 7)Task 7, which was focused at evaluating the aptitude of a model for knowledge inference, was more complex than other tasks. Notably, M-BERT outperformed the other models in task 7, securing hit@1, hit@3, and hit@10 scores of 61.12, 81.64, and 95.41, respectively. These results highlighted the pivotal role of the dictionary in knowledge inference. Furthermore, when processing documents in multiple languages, M-BERT outperformed BERT-base, which had been exclusively trained in a single language.
For task 6, the test results were poor. An analysis indicated that BERT models did not excel in this task because of the prevalence of outpatient medical records in the copy-and-paste format (Table S6 in ). Consequently, the significance of task 6 in this study was low.
Contributions to the Clinical Text Processing and Medical FieldsImportance of Multilingual ModelsThe experiment highlights the significance of using multilingual language models in processing bilingual clinical notes. The findings demonstrated that using a model capable of handling 2 languages yields superior performance compared with relying solely on a single language model. This insight is particularly relevant for countries such as Korea and Japan, where clinical documentation typically involves a mixture of languages.
Base for Model SelectionFurthermore, this study provides empirical evidence for choosing a proper BERT model, a factor not substantiated in existing NLP research. For instance, in previous studies, such as that conducted by Kim and Lee [], M-BERT was used for tasks such as extracting disease names, symptoms, and body parts from Korean text without providing explicit justification. The experimental results satisfied this gap by showcasing the superiority of M-BERT in understanding bilingual clinical text and supporting appropriate BERT selection in future studies.
Limitations and Future WorksLimited Scope of Clinical NotesThis analysis primarily focused on outpatient visit records. Future studies should encompass a broad range of clinical notes, including surgical notes, hospitalization records, and discharge summaries. Comparing and validating the performance of BERT models across various types of clinical documentation provides a comprehensive understanding of their effectiveness.
Single-Institution DataThis study exclusively used data from Seoul National University Hospital, which can limit the generalizability of the findings. Clinical notes can vary considerably in style and content across various health care institutions. Therefore, future studies should involve data from multiple hospitals to validate BERT model performance in various clinical settings.
More Tasks Should Be VerifiedThe BERT model requires further validation in bilingual clinical text. Oh et al [] conducted a study to recognize protected health information in the publicly available i2b2 2014 dataset. However, we could not perform this task because manual labeled annotations are required to extract non-English entities in bilingual clinical notes. In future studies, various tasks using bilingual clinical notes should be proposed.
ConclusionsIn this study, we comprehensively compared 4 BERT models, encompassing text in both English and Korean, within the multilingual clinical domain. We pretrained these models with approximately 160,000 patient records and evaluated their performances for 7 diverse downstream tasks. The experimental findings are summarized as follows.
First, the BERT-base and BioBERT models excelled in document classification tasks using [CLS] tokens. These results highlighted their superiority over M-BERT in tasks involving simple pattern recognition in word sequences. Second, the significance of having a comprehensive dictionary was evident in the reading comprehension task in which comprehensive token usage was required. The exceptional performance of M-BERT, which encompassed a broad range of Korean and English tokens, clearly confirmed the importance of the dictionary. Third, multilingual proficiency was pivotal for tasks that demanded complex reasoning. Both M-BERT and BioBERT excelled in task 7, which focused on diagnosing a multitude of candidates, and notably, M-BERT consistently outperformed BioBERT.
Our findings highlighted the suitability of BioBERT and BERT-base for tasks that relied on sequence patterns in multilingual clinical domains. In addition, M-BERT, which had an expansive dictionary and aptitude for leveraging Korean and English clinical contexts, was highly suitable for tasks involving textual content comprehension. The experimental results of the BERT models in mixed-language clinical documents provide valuable insights for future medical NLP research and appropriate BERT model selection for different types of tasks.
This study was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (no. NRF-2021R1I1A4A01042182).
The datasets generated and/or analyzed during the current study are not publicly available due to patient privacy concerns. Patient records contain personal information, and, as such, Seoul National University's institutional review board does not permit public disclosure of the data.
KK, SP, JM, and SP conceptualized the paper, developed the methodology, and prepared the original draft of the manuscript. KK contributed to software implementation and validated the findings. JYK and EYL curated the data, conducted investigations, and contributed to data analysis and interpretation. JE, KJ, YEP, EK, and JL contributed to methodology development, conducted formal analysis, and provided insights throughout the research process. JC supervised the study, managed the project administration, and contributed to reviewing and editing the manuscript.
None declared.
Edited by Christian Lovis; submitted 19.09.23; peer-reviewed by Christina Haag, Dillon Chrimes, Maria Chatzimina; final revised version received 08.07.24; accepted 17.08.24; published 30.10.24.
© Kyungmo Kim, Seongkeun Park, Jeongwon Min, Sumin Park, Ju Yeon Kim, Jinsu Eun, Kyuha Jung, Yoobin Elyson Park, Esther Kim, Eun Young Lee, Joonhwan Lee, Jinwook Choi. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.10.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
Comments (0)