Identifying cancer patients who received palliative care using the SPICT-LIS in medical records: a rule-based algorithm and text-mining technique

Study design and setting

The electronic medical records, including electronic doctor’s notes (eDN) and patient characteristics were extracted from Songklanagarind Hospital, the biggest hospital in Southern Thailand, database. The data scientists of the Division of Digital Innovation and Data Analytics (DIDA), Faculty of Medicine, Prince of Songkla University, supervised by Dr. Ingviya, randomly selected 100 inpatients diagnosed with cancer confirmed by the Cancer Registry and prepared their eDNs and patient characteristics to be reviewed. Two palliative care physicians independently reviewed the medical records of the 100 randomly selected cancer patients between February and June 2022. The records were reviewed using the Thai version of the SPICT-LIS to determine whether palliative care would have been beneficial to the patients.

Data source

The data of the study cancer patients were retrieved from the Cancer Registry of Songklanagarind Hospital, and further documents were queried from the hospital inpatient department data (IPD) prepared by DIDA as mentioned above. The eDN, patient’s characteristics and vital signs were extracted and stored using the PostgreSQL Relational Database Management System on a physical server in the DIDA Data Center. The querying and merging of text data were done though the PostgreSQL.

Inclusion/ exclusion criteria

All cancer inpatients aged 18 years or older diagnosed with cancer at Songklanagarind Hospital using the International Classification of Diseases and Related Health Problems 10th Revision Thai Modification (ICD-10 TM) [11] and the International Classification of Disease for Oncology (ICD-O) [12] from 2016 to 2020 were included in the initial study sample. Patients who had a first admission digital note of ≤ 1,000 words following their cancer diagnosis were excluded from the study to ensure that the study records had an adequate amount of the data required to assess the patients using the Thai SPICT-LIS criteria. The patient characteristics data extracted included birth date, sex, religion, ICD-10 and ICD-O, and cancer staging.

Data management and algorithm developmentTraining set

To create an initial training dataset, two palliative care physicians reviewed the whole records of 100 randomly selected patients and assessed if the patients had any of the six general indicators suggesting that they might benefit from palliative care, which were coded as 1 or 0 for patients who might or might not benefit, respectively. When there was disagreement between the two specialists, a consensus was reached by a face-to-face discussion.

Text-mining models

Text-mining models were created to extract essential data from standard language text in both the Thai and English languages eDN data via text mining by using a sequence of characters that formed a search pattern called a tokenization technique [13] (regular expression) [14] with the ‘LexTo’ package, a package enabling tokenization of the Thai Language in the R program.

Sentiment analysis

Sentiment analysis involves classifying data into categories like positive or negative [15]. For instance, the word “pain” might be labeled as negative, whereas the phrase “no pain” could be considered positive. Text segments in the code were passed directly as input to the model. In this study, the sentiment analysis model was trained to categorize the sentiment of a given text into two groups, patients who might be benefit from palliative care and those who might not.

Data dictionaries

A data dictionary encompassing a range of mixed Thai and English words/phrases was created using tokenization and sentiment analysis to classify patients into 2 groups based on whether they satisfied any of the six Thai SPICT-LIS general indicators or not. In general, words/phrases and/or sentences indicating symptoms and patient history were used to determine if the patients had presented with any of the six general indicators. The classification and extraction of each general indicator was performed on the physicians’ free-text comments using the mixed language data dictionary. For international readers of our paper, we translated the Thai words/phrases/sentences in the data dictionary to universally understood English terminology presented side by side with the Thai corresponding words/sentences as detailed in Table S1.

Rule-based algorithms

Two Rule-based algorithms were created based on Regular expression, Tokenization and Sentiment Analysis using the R Program version 4.0.3 (R Core Team, Austria) from the whole records written in a mixture of Thai and English words/phrases/sentences.

Strict and relaxed rule-based algorithms were used in this study. Strict-rule-based criteria were defined using a stringent set of criteria for identifying each indicator. The strict algorithm was characterized by its focus on using explicit and well-defined terms, which could have led to fewer cases meeting the criteria. In contrast, the relaxed rule-based algorithm used a more flexible approach characterized by its inclusiveness in considering a variety of factors that could have indicated the presence of the condition, which could have resulted in a larger number of identified cases. For example, in the strict rule-based criteria of the Thai SPICT-LIS algorithm, only ‘significant weight loss’ was included, while in the relaxed rule-based approach, additional keywords such as ‘weight loss,’ ‘underweight,’ ‘hyposthenic build,’ and ‘thinner’ were also considered alongside significant weight loss.

Outcome measurements

The main outcome was done to find the algorithm correctly identified patients who might benefit from palliative care as indicated by the SPICT-LIS. The instrument was originally back-translated into Thai by Sripaew et al., following the WHO guidelines for the systematic adaptation of tools, and was then found to provide consistent responses with good agreement among general practitioners, with a Fleiss-Kappa of 0.93 (0.76–1.00) [7]. The six indicators of the Thai SPICT-LIS are as follows: Indicator 1: performance status is poor or deteriorating, best available treatment has limited effect; Indicator 2: depends on others for care due to increasing physical and/or mental health problems; Indicator 3: the individual’s carer requires more help and support; Indicator 4: the individual experienced significant weight loss over the last few months or remains underweight; Indicator 5: persistent symptoms despite receiving the best available treatment for underlying condition(s) and is unable to access treatment; and Indicator 6: the individual (or family) asks for palliative care and chooses to reduce, stop, or not have treatment or wishes to focus on quality of life. Patients who would possibly benefit from palliative care were those who met at least two general indicators and at least one clinical indicator [7, 16, 17]. Patients who met these same criteria were defined as “should be offer palliative care as they could benefit from it” and the others were defined as “should not be offered” palliative care.do not meet the indicators for being offered palliative care at this time.

Inter-rater reliability

Percentage agreement and kappa statistics were used to measure the inter-rater reliability between the physicians and the strict and relaxed rule-based algorithms. Cohen’s kappa was interpreted as follows: a value above 0.7 indicates good agreement; values between 0.4 and 0.7 indicate moderate agreement; and values below 0.4 indicate poor agreement [18].

Prevalence and factor association

The number of patients with cancer who should be given palliative care as they will be benefit from it was compared between palliative care specialists and both strict and relaxed rule-based algorithms. Descriptive statistics and Fisher’s exact tests were used to compare the characteristics of patients with cancer who should be given palliative care with those of patients who should not. Factors associated with patients with cancer who required palliative care were assessed using Fisher’s exact test and multiple logistic regression analysis. Multiple logistic regression analysis to assess the factors associated with requiring palliative care including age, sex, cancer type, cancer stage, and patient symptoms such as pain, dyspnea, anorexia, edema, dysphagia, ascites, and xerostomia. A p-value of less than 0.05 was considered statistically significant.

Comments (0)

No login
gif