Traditional chinese medicine synonymous term conversion: A bidirectional encoder representations from transformers-based model for converting synonymous terms in traditional chinese medicine

ORIGINAL ARTICLE Year : 2023 | Volume : 9 | Issue : 2 | Page : 224-233

Lu Zhou1, Chao-Yong Wu2, Xi-Ting Wang3, Shuang-Qiao Liu4, Yi-Zhuo Zhang4, Yue-Meng Sun4, Jian Cui5, Cai-Yan Li4, Hui-Min Yuan4, Yan Sun6, Feng-Jie Zheng4, Feng-Qin Xu7, Yu-Hang Li4
1 School of Traditional Chinese Medicine, Beijing University of Chinese Medicine; Traditional Chinese Medicine (Zhong Jing) School, Henan University of Chinese Medicine, China
2 Shenzhen Hospital of Beijing University of Chinese Medicine, China
3 School of Life Sciences, Beijing University of Chinese Medicine, China
4 School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, China
5 Department of Medicine, Columbia University Irving Medical Center, New York, USA
6 TCM Information Science Research Center, School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, China
7 Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, China

Date of Submission15-Jun-2021Date of Acceptance04-Jan-2022Date of Web Publication06-Jun-2023

Correspondence Address:
Prof. Yan Sun
TCM Information Science Research Center, School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing
China
Prof. Yu-Hang Li
School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing
China

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/2311-8571.378171

Background: The medical records of traditional Chinese medicine (TCM) contain numerous synonymous terms with different descriptions, which is not conducive to computer-aided data mining of TCM. However, there is a lack of models available to normalize synonymous TCM terms. Therefore, construction of a synonymous term conversion (STC) model for normalizing synonymous TCM terms is necessary. Methods: Based on the neural networks of bidirectional encoder representations from transformers (BERT), four types of TCM STC models were designed: Models based on BERT and text classification, text sequence generation, named entity recognition, and text matching. The superior STC model was selected on the basis of its performance in converting synonymous terms. Moreover, three misjudgment inspection methods for the conversion results of the STC model based on inconsistency were proposed to find incorrect term conversion: Neuron random deactivation, output comparison of multiple isomorphic models, and output comparison of multiple heterogeneous models (OCMH). Results: The classification-based STC model outperformed the other STC task models. It achieved F1 scores of 0.91, 0.91, and 0.83 for performing symptoms, patterns, and treatments STC tasks, respectively. The OCMH method showed the best performance in misjudgment inspection, with wrong detection rates of 0.80, 0.84, and 0.90 in the term conversion results for symptoms, patterns, and treatments, respectively. Conclusion: The TCM STC model based on classification achieved superior performance in converting synonymous terms for symptoms, patterns, and treatments. The misjudgment inspection method based on OCMH showed superior performance in identifying incorrect outputs.

Keywords: Bidirectional encoder representations from transformers, misjudgment inspection, synonymous term conversion, traditional Chinese medicine

How to cite this article:
Zhou L, Wu CY, Wang XT, Liu SQ, Zhang YZ, Sun YM, Cui J, Li CY, Yuan HM, Sun Y, Zheng FJ, Xu FQ, Li YH. Traditional chinese medicine synonymous term conversion: A bidirectional encoder representations from transformers-based model for converting synonymous terms in traditional chinese medicine. World J Tradit Chin Med 2023;9:224-33
How to cite this URL:
Zhou L, Wu CY, Wang XT, Liu SQ, Zhang YZ, Sun YM, Cui J, Li CY, Yuan HM, Sun Y, Zheng FJ, Xu FQ, Li YH. Traditional chinese medicine synonymous term conversion: A bidirectional encoder representations from transformers-based model for converting synonymous terms in traditional chinese medicine. World J Tradit Chin Med [serial online] 2023 [cited 2023 Jun 7];9:224-33. Available from: https://www.wjtcm.net/text.asp?2023/9/2/224/378171 Introduction

The clinical practice of traditional Chinese medicine (TCM) has generated numerous medical records. Diagnostic and treatment terms in TCM medical records are the key points for exploring the rules of pattern differentiation and treatment. However, owing to variations in the experience and medical educational background of TCM practitioners, TCM terms with the same meaning are recorded as different literal descriptions.[1] For example, the Chinese symptom term “尿黄” (yellow urine) is recorded as “小便黄” or “小溲黄” and the treatment term “补脾益肺” (tonifying spleen and benefiting lung) as “培土生金” (strengthening earth to generate metal). Doctors with TCM backgrounds may not have difficulty understanding these synonymous terms. However, such differences hinder the data mining of pattern differentiation and treatment rules as well as international cooperation and popularization.

In 2020, the National Administration of TCM issued the document Clinic Terminology of TCM Diagnosis and Treatment Diseases[2] (hereinafter referred to as clinical terminology). This document stipulates that medical institutions at all levels should refer to clinical terminology to normalize the descriptions of clinical diagnostic and treatment text and consequently aid in the systematic extraction of diagnostic and treatment rules and promote international communication and cooperation in TCM.

Under Clinical terminology, the conversion of various diagnostic and treatment terms in TCM with the same meanings into a unified written description is defined as TCM synonymous term conversion (STC). In particular, the term before conversion is called the original term, and that after conversion is called the converted term. Because the manual conversion process is time-consuming and laborious when dealing with numerous medical records, artificial intelligence (AI) technology, which has shown remarkable progress in simulating human intelligence and performing repetitive work can be a useful technique for converting synonymous terms.

Natural language processing (NLP) is a popular AI research field. Several methods have been used to match synonymous medical text, such as Jaccard, DNorm, and Word2Vec.[3],[4],[5],[6] Jaccard simply compares an original term and candidate converted terms according to string similarity. Therefore, it cannot comprehend the similarity in meaning between terms. DNorm uses term frequency–inverse document frequency (TF-IDF) as the term encoding method and a weight matrix to determine the similarity in meaning between an original term and candidate converted terms. Word2Vec encodes the original term and candidate converted terms into embedding vectors, and the matching converted term is determined by the minimum cosine distance between the embedding vector of the original term and each candidate converted term. In general, these algorithms consider STC as a matching task by measuring the similarity between the original term and each candidate converted term. In addition to matching tasks, with the development of neural networks, bidirectional recurrent neural networks (Bi-RNNs) and Bi encoder representations from transformer (BERT) neural networks have demonstrated superior performance in terms of machine language translation, emotion classification, and other text processing tasks.[7],[8] These techniques may provide new concepts for exploring potentially notable STC methods.

In this study, we first constructed three benchmark datasets for symptom, pattern, and treatment terms according to clinical terminology. Subsequently, from the perspective of text classification, text sequence generation, named entity recognition (NER), and text matching, we designed four types of STC models based on BERT. The superior STC model was screened on the basis of the test results of each model in the benchmark datasets. In addition, three misjudgment inspection methods based on inconsistency and eight misjudgment inspection methods based on outlier detection were proposed to explore how to detect incorrect outputs in the STC model.

Methods

Data collection and preprocessing

Data collection and labeling

The data used in this study were collected from the platform of the Heritage Program of Chinese Well-Known Experts,[9] which contains medical records from several famous TCM practitioners. A total of 16,808 nonrepetitive TCM symptom terms, 3450 nonrepetitive TCM pattern terms, and 3732 nonrepetitive TCM treatment terms were identified. Two doctors qualified in TCM practice labeled them by referring to clinical terminology. The collected terms were defined as the original terms and their labeled terms as the converted terms, which served as the input and output data of the STC model, respectively.

An original term may be labeled as a single or multiple converted terms according to clinical terminology. The multiple converted terms were separated by comma, as shown in [Table 1]. When the meaning of an original term could be described by a specific term in clinical terminology, the specific converted term was labeled as the converted term. When an original term had a complex meaning that could not be described with one specific term in clinical terminology, multiple specific terms in clinical terminology were labeled as the converted term to describe the meaning of the original term.

The two TCM experts independently reviewed the labeling results of the converted terms. Inconsistent labeling results were submitted to a third expert for review, and further discussions were conducted to ensure consistency of the labeling results. Thereafter, 1501 converted terms of TCM symptoms, 641 converted terms of TCM patterns, and 681 converted terms of TCM treatment were collected, of which 339,113, and 137 converted terms occurred only once, respectively.

Dataset preprocessing

High-frequency converted terms are multiple original terms corresponding to the same converted terms. Processing such high-frequency converted terms is time-consuming and repetitive. Moreover, these terms indicate adequate modeling samples, which are helpful for the evaluation of the model performance. Therefore, we used Zipf's law[10] (Equation [1]) to calculate the boundary between the high-and low-frequency converted terms and then selected the high-frequency converted terms with the corresponding original terms as the benchmark datasets for evaluating the STC model. According to Zipf's law, the frequency boundary between the high-and low-frequency results is defined as follows:

Where I1 is the number of converted terms that occur only once in the labeling results and N is the boundary between the high and low frequencies.

Following Zipf's law, we constructed a high-frequency symptom dataset (SYMDS), high-frequency pattern dataset (PATDS), and high-frequency treatment dataset (TMDS). These sets were further combined into a total dataset (symptom, pattern, and TMDS [SPTDS]). The original terms for each converted term were randomly divided into a training set (70%), development set (15%), and test set (15%), as shown in [Table 2]. In total, 23,067, and 69 nonrepetitive converted terms corresponded to the SYMDS, PATDS, and TMDS, respectively.

Model construction

We noted that four manual approaches were adopted to convert the synonymous terms: (1) Directly write the converted terms based on converting experience; (2) directly recognize the positive converted terms from clinical terminology; (3) select the candidate converted terms from clinical terminology, form a string of candidates, and then select the converted term from the string; and (4) select the candidate converted terms from clinical terminology and then examine each candidate term to determine whether it matches the original term.

In combination with NLP tasks, (1) sequence generation, (2) classification, (3) NER, and (4) matching can be used to simulate the four manual approaches. Therefore, we proposed four types of STC models based on these four concepts, where the pretraining transformer block module of BERT[11] was applied as the basic algorithm module.

Based on the sequence generation concept, we constructed two sequence generation models: (STC-token sequence generation [STC-TSG]) and STC-(label sequence generation). Based on the classification concept, we established three STC models: STC classification (STC-C), STC-TC (inspired by the thinking process of TCM practitioners and based on the STC-C model), and STC-CC (combined convolution neural network and classification). The STC model based on the matching concept was called STC matching (STC-M), and the STC model based on the NER concept was called STC-CNER (combined classification and NER).

The best model was found to be the STC-TC based on the test results of the STC models. Because the STC-TC model relied on the STC-C model, the details of the STC-C and STC-TC models are described in Sections 2.2.1 and 2.2.2, respectively.

Traditional Chinese medicine synonymous term conversion-classification model

Considering STC as the classification task, an adjusted model based on the transformer block of BERT and a fully connected layer with a sigmoid function was established and called the STC-C (classification) model. The STC-C model was defined as follows:

In procedures (2)-(10) Tin is the token list of the original terms with CLS, SEP, and PAD. For example, the original Chinese term is “寐中汗出” (night sweat); Tin = [CLS, 寐, 中, 汗, 出, SEP, PAD, PAD., PAD], where CLS is the start token, SEP is the end token, and PAD is the token for padding token list length to ensure inputs for each batch in training have the same length. Furthermore, |t| is the maximum token list length for each batch during training, and Iin is the token index list for Tin. Based on the token vocabulary of BERT (the vocabulary has 21,128 tokens, and the token index ranges from 0 to 21,127), the token list Tin is converted to the token index list Iin, where i is the index of the token. In addition, Sin is a segment list of tokens that differentiates between tokens belonging to the original term and PAD. In Sin, the notation s represents a binary value (0 or 1); if a token is not PAD, s = 0; otherwise, s = 1.

An embedding layer of 21,128 × 768 for Iin and an embedding layer of 2 × 768 for Sin, generate and . Ei and Es are embedding features of 768 dimensions.

Starting with H0, the transformer block of BERT[11] was used to encode the features of each token in tin. l is the index of the transformer block of BERT, where l∈ [1, L], L = 12. Further, HL denotes the final encoded features of the tokens in tin, and HL|CLS| denotes the feature of the first token (CLS). The feature of the entire original term was encoded into the first token by the transformer block with the Pooler mode of BERT.

WN is a full connection layer of size 768 × N, where N is the label number, that is, the number of total converted terms. The probability value of each converted term was obtained using a sigmoid function. P is the probability value of the converted terms. In the loss function, y is a one-hot vector corresponding to the correctly converted term, where k represents the index of the converted term.

Traditional Chinese medicine synonymous term conversion-TC model

With increasing experience in labeling, TCM practitioners consider some candidate converted terms. This will help them better find converted terms from clinical terminology. Inspired by this thought process, we proposed a model called the STC-TC model, which performed better than did the other models. The model was defined as follows:

In procedures (11)–(24), O20 is a label list of the previous 20 candidate converted terms according to the output probability of the STC-C model, which is the process of thinking of some candidate converted terms. E20 is the embedding vector (20 × 768) of O20. Input E20 to a convolutional neural network (CNN) model[12] to output feature C with dimensions of 768 (by setting the output dimension of the fully connected layer of the CNN model). This is the process of thinking about the candidate terms. The CNN model has four convolution layers; the filters and kernel size of each layer are (256, 10), (256, 7), (256, 5), and (256, 3), respectively. and C were concatenated as Z (1536 dimension). Here, WN is a fully connected layer of size 1536 × N, where N is the label number. An example of the STC-TC model is shown in [Figure 1].

Model training

Two strategies were applied in the TCM STC model training: “one model for one task” and “one model for three tasks.” In the “one model for one task,” the SYMDS, PATDS, and TMDS were used to build three TCM STC models for symptom, pattern, and treatment conversion, respectively. In the “one model for three tasks,” the integrated dataset (SPTDS) was used to build a TCM STC model that could convert the three types of original terms.

The Chinese pretraining weights of the transformer blocks from the website (https://storage. googleapis. com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip) were used in the training step. The batch size was set to 16; the dropout rate was 0.1; and the learning rate was 3e-5 (the learning rate was selected from 2e-5, 3e-5, and 5e-5 according to the development set). The Adam optimizer was used in this study.[13] When the F1 score in the development set did not increase for 20 epochs, the training was terminated. The output threshold of the sigmoid function was selected using the threshold-moving method based on the F1 score of the model on the development set.

These models were constructed using the TensorFlow neural network framework (http://www.tensorflow. org/), and the model training was accelerated using NVIDIA GeForce RTX2080 (memory: 11 GB).

Each model was tested 10 times using the above parameters, and its performance was evaluated in terms of four indices: Accuracy, precision, recall, and F1 score.

In (25)–(28), correct is the correct number of model predictions; total is the number of test data; true positive is the number of predicted outputs of the model that are consistent with the actual results; false negative is the number of correct results that the model fails to output; and false positive is the number of incorrect outputs of the model.

Model comparison

Various NLP-based models have been applied in medical term conversion in recent years, such as the encoder–decoder model[14] (a sequence generation model based on Bi-RNN), Att-BiLSTM model[15] (a medical text classification model based on the attention mechanism and Bi-RNN), Bi-LSTM-CNN model[16] (an NER model based on Bi-RNN and CNN for term normalization), DNorm model[5] (a model of disease name normalization), Word2Vec with cosine model[6] (the model was also used for term normalization), and TF-IDF with the KNN model.[17]

This study used these modeling methods and STC datasets (e.g., SYMDS, PATDS, TMDS, and SPTDS) to establish the STC models for comparison. The names of the STC models, modeling methods, and inputs and outputs of the models are listed in [Table 3].

Misjudgment inspection method for the output results

In an actual application, the correspondence between the original and converted terms should be checked to ensure the accuracy of the model output. It is quite difficult and inappropriate to manually check all output results; therefore, we proposed a misjudgment inspection method. This method allowed the model to prompt incorrect output results, which could reduce the manual verification workload and improve work efficiency.

The misjudgment inspection method allows models to prompt an incorrect output result in the STC process. The design concept of the misjudgment inspection method based on inconsistent output results is to repeat the experiment on the original terms by adjusting the model structure after the model completes the term conversion. If the inconsistent output result exceeds a certain threshold, the output result of the model may be considered incorrect, and manual verification is required. To realize the misjudgment inspection function, we designed three misjudgment inspection methods: (1) randomly inactivating the neurons of the model multiple times and repeating the experiment, (2) using multiple models with different structures for repeated experiments, and (3) using multiple models of the same structure for repeated experiments. One of the changes in the structure of the model is the misjudgment inspection of the heterogeneous model. Another method does not change the structure of the model; therefore, it is called misjudgment inspection of the isomorphic model.

In addition, we performed another misjudgment inspection method based on outlier detection. The neural network structure model can extract feature vectors, that is, it can express the input data in the form of mathematical vectors. These feature vectors can be used to output converted terms and detect outliers. If the model considers the input feature vector to be an abnormal point, the conversion result is judged as an error output.

Misjudgment inspection based on inconsistent output results

Three methods were applied for misjudgment inspection based on inconsistent output results: Neuron random deactivation (NRD), output comparison of multiple isomorphic models (OCMI), and output comparison of multiple heterogeneous models (OCMH).

The NRD method uses a trained TCM STC model, M, for term conversion. The converted term o is obtained by inputting the original term. Thereafter, a series of outputs is obtained by performing NRD on M i times with the same input. Finally, the output o is compared with each. If the inconsistency rate is higher than 0.05, the output o of the model is considered to be incorrect.

The OCMI method uses a trained TCM STC model, M, for term conversion. The output o is obtained by inputting the original term. With the same input, a series of outputs is obtained by the i models. The structure and training set of the i models are the same as those of M, except for the random seed used in the training. Thereafter, the output o is compared with each in . If the inconsistency rate is higher than 0.05, the output o of the model is considered incorrect.

The OCMH method uses a trained TCM STC model, M, for term conversion. The output o is obtained by inputting the original term. With the same input, and are obtained by the i STC-C, i STC-CNER, and i STC-CC models, respectively.

All STC-CC models were constructed using the same training process, except for the random seed used in the training. The output o is then compared with each . If the inconsistency rate is higher than 0.05, the output o of the model is considered incorrect.

Among these methods, the TCM STC model M is the STC-C model or STC-TC model because the two models are better than the other models, and i is an integer with a maximum value of 30 in this study.

Misjudgment inspection based on outlier detection of extracted features

If the feature of the original term is an outlier, it is likely to be misjudged by the model. To determine whether it is an outlier, we proposed the following process.

Initially, the original term S in the test set was fed into a trained TCM STC model M to obtain the output-converted term O. The m original terms in the training set labeled O were selected and defined as T = . Subsequently, the features of S and T were extracted from the final transformer block of the trained model M and are defined as fs and , respectively. These features were then fed into an outlier detection model for outlier detection. When fs was detected as an outlier, the original term S was considered a misjudgment that needed to be manually checked. Therefore, eight outlier detection methods were applied to the extracted feature-based misjudgment inspection: Angle-based outlier detection,[18] histogram-based outlier score,[19] principal component classifier (PCA),[20] robust random cut forest,[21] one-class support vector machine,[22] isolation forest,[23] local outlier factor,[24] and density-based spatial clustering of applications with noise (DBSCAN).[25]

Two metrics were used to evaluate the misjudgment inspection performance of the models: Check rate (CKR) and wrong detection rate (WDR), which were defined as follows:

[INLINE:5]

In (29) and (30), Ttest is the number of original terms that are correctly converted, and Ftest is the number of original terms that are wrongly converted in the test dataset by a trained STC model M. Further, Thit is the number of original terms that are wrongly converted but correctly detected by the misjudgment inspection model, and Fhit is the number of original terms that are correctly converted but wrongly detected by the misjudgment inspection model.

Results

Performance of the synonymous term conversion model based on the “one model for one task”

As shown in [Table 4], the F1 scores for STC ranged between 0.85 and 0.92. For pattern term conversion, the F1 scores ranged between 0.84 and 0.92, whereas for treatment term conversion, the F1 scores ranged between 0.80 and 0.83. The STC-TC model performed better than did the comparison models and other models based on sequence generation, NER, or matching.

Table 4: Test results of the synonymous term conversion models under the “one model for one task” (mean±standard deviation)

Click here to view

Performance of the synonymous term conversion model based on the “one model for three tasks”

As shown in [Table 5], the F1 scores for symptom term conversion ranged between 0.85 and 0.91. The F1 scores for pattern term conversion ranged between 0.85 and 0.91, whereas those for treatment term conversion ranged between 0.77 and 0.84. The STC-TC model showed better performance than did the other models. However, the models based on the “one model for one task” performed better than did the models based on the “one model for three tasks.”

Table 5: Test results of the synonymous term conversion models under the “one model for three tasks” (mean±standard deviation)

Click here to view

Results of misjudgment inspection

For the extracted features of symptoms, the best WDR was obtained with DBSCAN (0.65 for STC-C and 0.66 for STC-TC), and the corresponding CKRs were 0.47 and 0.45, respectively. For pattern, the best WDR for the STC-C model was obtained with PCA (0.85), and the corresponding CKR was 0.45. The best WDR for the STC-TC model was Histogram-based outlier score (HBOS)[19] (0.91), and the corresponding CKR was 0.50. For treatment, the best WDR was obtained with DBSCAN (0.65 for STC-C and 0.64 for STC-TC), and the corresponding CKRs were 0.47 and 0.45, respectively.

[Figure 2] and [Figure 3] show the misjudgment inspection results obtained using the NRD, OCMI, and OCMH methods, respectively. We observed that the OCMH method with 18 models produced the best results. The WDRs for the STC-C model for the converted terms of symptoms, patterns, and treatments were 0.84, 0.86, and 0.86, respectively, and the corresponding CKRs were 0.32, 0.29, and 0.45, respectively. Similarly, the WDRs for the STC-TC model were 0.80, 0.84, and 0.90, and the corresponding CKRs were 0.30, 0.27, and 0.47, respectively.

Figure 2: Misjudgment inspection results of the synonymous term conversion-classification model. Note: i is the number of models in the OCMI or OCHM method or the number of neuron random deactivations in the NRD method. (a and d) Show the WDR and CKR for symptoms, respectively; (b and e) show the WDR and CKR for patterns, respectively; (c and f) show the WDR and CKR for treatments, respectively

Click here to view

Figure 3: Misjudgment inspection results of the synonymous term conversion synonymous term conversion-TC model. Note: i is the number of models in the OCMI or OCHM method or the number of neuron random deactivations in the NRD method. (a and d) Show the WDR and CKR for symptoms, respectively; (b and e) show the WDR and CKR for patterns, respectively; and (c and f) show the WDR and CKR for treatments, respectively

Click here to view

Discussion

To date, the conversion of large-scale synonymous TCM terms is challenging. AI technology can be used to intelligently convert synonymous terms into a unified written description, which has the benefits of reduced labor costs and improved consistency in medical records. To explore the superiority of the STC model, we first constructed three benchmark datasets. Thereafter, we proposed four modeling concepts, namely, classification, sequence generation, NER, and matching, which were used to simulate the case of manual conversion of synonymous terms by TCM practitioners. Based on the four modeling concepts, four types of STC models, namely, (1) classification-based STC model (including STC-C, STC-TC, and STC-CC), (2) sequence-generation-based STC model (STC-TSG and STC-TSG), (3) NER-based STC model (STC-CNER), and (4) matching-based STC model (STC-M), were constructed to screen out the superior model.

The analyses showed that the STC-TC model based on classification outperformed the other models. In addition, comparisons with other models were conducted to convert synonymous terms.[5],[6],[14],[15],[16],[17] We found that the STC-TC model was better than the comparison models in terms of synonymous terms. Among the STC models established in this study, the models based on the BERT structure outperformed the models based on other structures, such as the RNN structure. This suggests that the STC-TC model based on BERT is superior for converting synonymous TCM terms in the context of large data.

Symptoms, patterns, and treatments are all STC tasks, which can be solved using one model, similar to animal image classification[26] and text classification models;[27] they can be treated as three different tasks and establish three models, respectively. To explore the differences between the use of one model and three models, we adopted two strategies: “one model for three tasks” and “one model for one task.” However, our analyses indicated that the STC model based on the “one model for one task” achieved better performance in the conversion of symptom, pattern, and treatment terms than did the model based on the “one model for three tasks.”

Although the analyses indicated that the “one model for one task” strategy was better, we also observed that the STC-TC model could still achieve a good performance in the “one model for three tasks” strategy compared with many STC models based on the “one model for one task.” This suggests that the model may benefit from a single model for multiple tasks by adding more synonymous types and appropriate model adjustments.

In practice, the results output by the STC model should be reviewed by TCM practitioners to rectify incorrect outputs and ensure the quality of the data. In the process, TCM practitioners hope that the model can distinguish between correct and incorrect outputs and then only prompt the incorrect outputs to improve process efficiency. This is the first time that misjudgment inspection methods of the STC model from two perspectives have been proposed, including three methods based on inconsistent output detection and eight methods based on outlier detection.[18],[19],[20],[21],[22],[23],[24],[25] With regard to the WDR, the OCMH method-based inconsistent output detection showed the best results. The method uses different models for misjudgment inspection, which is similar to having several different TCM practitioners check converted terms. This may be an important factor in increasing the possibility of detecting incorrect outputs. Furthermore, the WDR indicated that although the OCMH method performed well in detecting incorrect outputs, it could not detect all incorrect outputs. Accordingly, further improvement of the WDR such that all incorrect outputs can be found will be a key point in future research.

Conclusion

In this study, we developed STC models with different modeling concepts and training strategies based on BERT and analyzed their performance. The STC-TC model showed superior performance in the symptom, pattern, and treatment tasks. The “one model for one task” training strategy offered more advantages than did the “one model for three tasks” strategy. Moreover, we developed the OCMH method, which achieved the best performance in terms of misjudgment inspection. Hence, the STC-TC model based on the OCMH method is the optimal model for converting synonymous terms for symptoms, patterns, and treatments. Whether this model can be used for other TCM terms is worth exploring.

Availability of data and materials

All the models used in the current study are available from the corresponding author upon reasonable request.

Financial support and sponsorship

Financial support and sponsorship:The National Key R&D Program of China supported this study (2017YFC1700303).

Conflicts of interest

There are no conflicts of interest.

References

1.Jia LR, Liu LH, Yang S, Li JH, Gao B, Zhu L, et al. Problems and suggestions of standardization of traditional Chinese medicine terms. China Digit Med 2013;8:12-4.

2.N.A.o.T.C. Medicine, National Health Commission on Printing and Distributing the Clinic Terminology of Traditional Chinese Medical Diagnosis and Treatment and the Clinic Terminology of Traditional Chinese Medical. National Administration of Traditional Chinese Medicine. Available from: http://yzs.satcm.gov.cn/zhengcewenjian/2020-11-23/18461.html. [Last accessed on last accessed 16 May 2021].

3.Zeng X, Jia Z, He Z, Chen W, Lu X, Duan H, et al. Measure clinical drug-drug similarity using Electronic Medical Records. Int J Med Inform 2019;124:97-103.

4.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space.Proceedings of the International Conference on Learning Representations, 2013.

5.Leaman R, Islamaj Dogan R, Lu Z. DNorm: Disease name normalization with pairwise learning to rank. Bioinformatics 2013;29:2909-17.

6.Cho H, Choi W, Lee H. A method for named entity normalization in biomedical articles: Application to diseases and plants. BMC Bioinformatics 2017;18:451.

7.Vathsala MK, Ganga H. RNN based machine translation and transliteration for Twitter data. Int J Speech Technol 2020;23:499-504.

8.Gao Z, Feng A, Song X, Wu X. Target-dependent sentiment classification with BERT. IEEE Access 2019;7:154290-9.

9.Runshun Z, Qi X, Kun L, Shizhen F, Xiuxin J, Zhiwei J, et al. Design and application of the management platform of the “Heritage Program of Chinese Well-Known Experts” of China Academy of Chinese Medical Sciences. World Sci Technol 2016;18:761-8.

10.Pao ML. Automatic text analysis based on transition phenomena of word occurrences. J Am Soc Inf Technol 1978;29:121-4.

11.Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1; 2019. p. 4171-86.

12.Kim Y, Jernite Y, Sontag D, Alexander M. Rush. Character Aware Neural Language Models. Proceedings of Thirtieth AAAI Conference on Artificial Intelligence 2016. p. 2741-49. Available from: https://dl.acm.org/doi/10.5555/3016100.3016285. [Last accessed on 2023 Apr 11].

13.Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations. San Diego, America; 2015. p. 1-15.

14.Luong MT, Pham H, Manning CD. Effective Approaches to Attention-Based Neural Machine Translation. Lisbon: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; 2015. p. 1412-21.

15.Chen CW, Tseng SP, Kuan TW, Wang JF. Outpatient text classification using attention-based bidirectional LSTM for robot-assisted servicing in hospital. Information (Switzerland) 2020;11:106.

16.Zhao SD, Liu T, Zhao SC, Wang F. A neural multi-task learning framework to jointly model medical named entity recognition and normalization. Proc AAAI Conf Artif Intell 2019;33:817-24.

17.Trstenjak B, Mikac S, Donko D. KNN with TF IDF based framework for text categorization. Procedia Eng 2014;69:1356-64.

18.Kuhnt S, Rehage A. An angle-based multivariate functional pseudo-depth for shape outlier detection. J Multivar Anal 2016;146:325-40.

19.Goldstein M, Dengel A. Histogram-Based Outlier Score (HBOS): A Fast Unsupervised Anomaly Detection Algorithm, KI-2012: Poster and Demo Track; 2012.

20.Zhao M, Li YM. Several applications of principal component analysis and corresponding R language practice. Hans J Data Min 2021;11:203-16.

21.Guha S, Mishra N, Roy G, Schrijvers O. Robust Random Cut Forest Based Anomaly Detection on Streams. Proceedings of the 33rd International Conference on Machine Learning; 2016.

22.Ali J, Hany S, Mark R, Arjun S, Naini Ali S. A Brain Tumor Segmentation Framework Based on Outlier Detection Using One-Class Support Vector Machine. 42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society: Enabling Innovative Technologies for Global Healthcare EMBC'20; 2020.

23.Mikhail T, Paweł K. A probabilistic generalization of isolation forest. Inf Sci 2022;584:433-49.

24.Omar A, Raed A, Terence S, Xiaogang M. A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cognit Comput 2020;5:1-1.

25.Schubert E, Sander J, Ester M, Kriegel HP, Xu X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans Database Syst 2017;42:1-21.

26.Willi M, Pitman RT, Cardoso AW, Locke C, Swanson A, Boyer A, et al. Identifying animal species in camera trap images using deep learning and citizen science. Methods Ecol Evol 2019;10:80-91.

27.Aydoğan M, Karci A. Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification. Physica A Stat Mech Appl 2020;541:123288.

[Figure 1], [Figure 2], [Figure 3]

[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]

View original article

WORLD JOURNAL OF TRADITIONAL CHINESE MEDICINE

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Traditional chinese medicine synonymous term conversion: A bidirectional encoder representations from transformers-based model for converting synonymous terms in traditional chinese medicine

Comments (0)