Medical linear accelerators (linacs) are the most important equipment in radiotherapy. At their core, these are particle accelerators reconfigured as medical devices. The mode of operation of the linacs uses similar physical principles as high-energy accelerators for particle physics research. However, the medical devices operate in hospital environments that pose additional challenges for operations due to the different physical environments and availability of technical support. Operating medical linacs requires skilled personnel to repair, adjust, and otherwise maintain the proper operation of the devices. Further, medical linacs have many subsystems that all must operate faithfully for the device to function correctly. Given their complexity, linacs can fail in a multitude of different manners.
It is desirable to understand the nature of medical linac failures for several reasons. First, components used in medical linacs are costly, and improved knowledge of components that fail more often can be of help in projecting service and maintenance costs for medical linacs. Second, the training of the qualified technical staff able to maintain these devices can be simplified with better knowledge of failure modes since emphasis can be placed on areas that fail more frequently. Third, a better understanding of failure modes can help medical linac operators in stocking components that are more likely to be needed in maintenance, which can help reduce repair times.
There have been relatively few studies of failure modes for linacs. Wroe and colleagues1 studied downtime and failure modes for radiotherapy equipment in lower income and developed countries. Sheehy and colleagues2 performed a reliability analysis of radiotherapy equipment in lower income countries. Both studies commented on the difficulty in obtaining sufficient quality data and statistics to conduct their analysis. The first study relied on the manual review of linac maintenance records, in both electronic and paper form to estimate failure modes and times between failure. The second study improved in the first by building failure mode models using its results.
Modeling of linac failure modes would be greatly simplified and improved with high quality and consistent input data for the model development. However, medical linac maintenance data are often kept in generic equipment maintenance databases whose primary purpose is to keep records of maintenance, but not to classify and analyze the failure modes. The core information in maintenance logs is recorded in narratives by maintenance personnel. In general, these logs depict the repair procedures and maintenance results, making it useful for analysts to evaluate the linac performance and identify failures. But these logs have the characteristics of colloquial, unformatted, noisy, and may contain spelling and grammatical errors, which is not suitable for general analytical tools. In particular, it is time-consuming to analyze this type of data by humans when the scale of data is large.
This research is triggered by the natural language processing (NLP) application in the transportation domain, where researchers applied topic modeling (TM), a type of unsupervised learning algorithm, into categorizing safety reports for reducing incidents’ events.3, 4 The NLP technology is established with a suite of methods capable of interpreting, evaluating, and generating narratives in human language. There has been a slightly increased amount of literature on medical records analysis during the COVID-19 pandemic. Shah et al.5 carried out a number of investigations into the patient online reviews in physician rating websites to examine trends of patient concerns due to the COVID-19. The coherence-based TM method was applied to generate topics and corresponding keywords and experiment results showed that policymakers can benefit from the topic analysis to deal with the COVID-19 crisis efficiently. Kaveh-Yazdy and Zarifzadeh6 investigated the top-ranked people concerns to the COVID-19 in Iran. Based on the output of the TM model, researchers summarized the major concerns are PCR lab and test, policy on the education system, and personal protection actions such as wash hands and wear masks. In our study, the emphasis was placed on the maintenance work of linacs, where the TM method was applied to analyze the massive and unformatted linac maintenance logs to identify the most frequent failure modes.
The purpose of this work is to investigate the feasibility of using TM to analyze electronic medical linac maintenance logs. The main contribution of this article is to introduce TM to analyzing the unstructured maintenance logs data to find out the most frequent failure modes of linacs during daily use. Another purpose is to demonstrate the performance of different linacs over time by examining the trends of different failure modes. With a data-driven analysis method, it is hoped that the larger pool of current medical linac maintenance logs can be used to better understand medical linac failure modes.
2 MATERIALS AND METHODS 2.1 Linac maintenance logsThe maintenance logs used in this study were collected from several linacs of BC Cancer center, Kelowna (Canada) under the regulation designed by the Canadian Nuclear Safety Commission. The linacs were in service from April 1998 until the study date. There were nine linacs in total, four of these were replaced partway through the study period. The fifth linac was also added in 2009. These linacs were installed in five different treatment rooms, labeled A–E as shown in Table 1. The dates of service, manufacturer, and model were also listed. Of the four original linacs, two were equipped with multi-leaf-collimators (MLCs) and amorphous silicon-based electronic portal imaging (EPID); the other two did not have MLCs but had fluorescence imaging-based portal imaging which was upgraded to amorphous silicon EPIDs. The five accelerators in service from 2011 onward are modern medical linacs with MLCs, EPIDs, and kV based on-board imaging. The dataset used in our study was from nine different linacs recorded from April 1998 to December 2019 consisting of 4323 entries in total.
TABLE 1. Linacs specification and service date Treatment room Manufacturer Model Starting service date End service date A Elekta SL75 April 1998 October 2008 A Varian Clinac iX July 2009 September 2021 B Elekta SL75 April 1998 December 2009 B Varian Clinac iX September 2010 January 2021 C Elekta SL 20 July 1998 July 2010 C Varian TrueBeam March 2011 Present D Elekta SL 20 July 1998 February 2011 D Varian TrueBeam August 2011 Present E Varian Clinac iX November 2009 PresentThe maintenance log is a collection of narrative maintenance records of linac repair and service work completed by maintenance personnel. In our study, there are two main parts in the logs, namely ‘‘Comments’’ and ‘‘Repair Description.’’ ‘‘Comments” briefly describes the linac status and breakdown occurs on the linacs. ‘‘Repair Description’’ records the maintenance procedure, repair action, and related broken component of linacs. Some metadata were also recorded, such as the date of the maintenance service. Table 2 shows two maintenance log entries sliced from the original dataset. Apart from the ‘‘Comments,’’ ‘‘Repair Description,” and ‘‘Date,’’ Keyword ‘‘TaskKey” tells the type of the maintenance service.
TABLE 2. Example entries of linac maintenance logs TaskKey Comments Repair Description Date Corrective Touchguard interlock could not be removed with iView detector in place. Adjusted touchguard microswitches at the locking pin end for proper contact. Adjusted the alignment nuts on both sides of detector for easier locking pin insertion. April 1998, 16 Major repairs CARR/FOIL W29 cable repaired. Error 7F when calibrating carrousel, was also getting error 70 when exiting calibration, pointing to switch S16. Cleaned all five switches and reseated connectors J82 and J83. Adjusted the Carr pot voltage to 5.05 V from 4.74 V. Replaced and adjusted S16 on the carrousel switch assy, no change. We could reproduce the fault by moving the gantry from Zero degrees to 350°. Lubed Carr chain with TriFlow. Replaced the PWM pcbA4, no change. January 2020, 29As shown in Table 2, the narratives in the logs (‘‘Comments’’ and ‘‘Repair Description’’) contain a wealth of information describing the health condition of linacs and repair action. However, to identify frequent failure modes through examining the logs, it is evident that it would be time-consuming to extract key information from the lengthy sentences by humans, especially for the whole dataset. In such a case, by using the NLP techniques, the key information, in our case, the failure modes and related components of linacs, can be extracted automatically and quickly from the logs. Furthermore, the temporal analysis method was used by implementing the metadata ‘‘Date” to find out the trend of specific failure modes over time. As mentioned previously, the linacs have been replaced and added from different manufacturers around 2010. Thus, the temporal analysis was used to examine whether there is a difference between different linac models.
2.2 Topic modeling of linac maintenance logs 2.2.1 Latent Dirichlet allocation modelFollowing the documents representation method, latent semantic indexing (LSI), Blei et al.7 proposed latent Dirichlet allocation (LDA) algorithm and formulated a general technique named probabilistic TM. TM is a typical unsupervised machine learning algorithm, and it doesn't require labeling the dataset but constructs a model solely on the distribution of the words in documents. TM is capable of extracting core information by distilling topics from messy documents.
LDA is the most commonly used algorithm to perform the TM in a collection of documents. LDA constructs a three-layer architecture between documents, topics, and words by independent multinomial distributions. Each document is represented by several latent topics and each topic is governed by a multinomial distribution over words. In our application scenario, for each entry of the maintenance logs, the ‘‘Comments’’ and ‘‘Repair Description,’’ were combined as a document. All documents in the maintenance logs dataset comprise the corpus. LDA summarizes the documents with several topics by searching similar ‘‘bag” of words co-occurring in all documents. The words with top frequency in each topic describe the core information of the topics. In our case, the top-ranked words often point to some kinds of failure modes. Thus, the TM can help us find the most frequent failure modes by searching the keywords in the dataset. It should be noted that a single document is often represented by several topics. This is quite reasonable that one maintenance service usually handles multiple failures.
To further explain the mechanism of the LDA model. Some notations and assumptions are introduced here. A document noted asSet prior parameters: .
For each document , choose
.
For each word , the topic of the word belonged denoted as
is drawn from
.
The word itself is a variable drawn from another distribution
.
Graphical representation of latent Dirichlet allocation model
The joint distribution ofThe number of topics is the most important parameter need to be determined when training an LDA model. A too-small
will concentrate too much information on a single topic, making it difficult to identify the specific failure mode and map it to the responsible components. Likewise, a too big
will make the information too scattered, leading to some meaningless topics. However, the TM is an unsupervised method, there is no ground truth to provide reference for selecting the optimal number of topics. Thus, some metrics are proposed to help to address this issue by evaluating the topic model.9-13
Evaluation metrics can provide a good reference for finding the optimal. Kuhn3 used a pair of trade-off metrics, namely coherence and exclusivity, and selected the outlier as the optimal
. Wang et al.14 selected Jensen–Shannon divergence and perplexity to find the optimal metrics. Another more direct way is to check the result of the models and discern whether it is reasonable.15 Tanguy et al.16 chose 50 as the optimal
by a subject matter expert in the application of analyzing the aviation safety reports.
In this article, perplexity, divergence, and coherence were used to determine a narrow range of optimal . The final optimal number of topics is selected by three subject experts by checking the model interpretability.
Tokenization: Token is the basic element in a topic model. This process breaks up the sentence into an individual token for the following processing and analysis.
Words cleaning: Remove punctuation characters, numbers, and stop words that highly frequent occurring in most topics while contributing little to the topic building, such as preposition words and etc. In addition, any words that occurred in the whole dataset less than three times were removed.
Lemmatization: Lemmatization aims to return the base or dictionary form of words so that they can be treated as a single element via TM. Another more aggressive technique called stemming is tested ineffective because it may combine distinct tokens as one and convert jargon in the linac field to some other words.
Lowercase conversion: Convert words to lowercase.
3 EXPERIMENTS AND RESULTSThere are three parts in this section. First, the process of selecting the optimal using the metrics mentioned in Section 2 was demonstrated. Then, the method of interpreting the topic contents produced by the LDA model was demonstrated and the most frequent failure modes of linacs were summarized. Finally, the temporal analysis method was used to identify the trends of some failure modes.
The number of topics is the most significant parameter in building a good LDA model. To find the optimal
, three metrics mentioned in Section 2 were used to evaluate LDA models with different
. Figure 2 illustrates the trends of divergence metric, perplexity, and coherence as the number of topics increases from 5 to 70.
Metrics used to select the optimal number of topic
The value of divergence reaches a lower level, which indicates a possible good LDA model when the topic numbers were set from 25 to 50. However, the perplexity increases as the topic numbers exceed 25. Since the model with a smaller perplexity has a better prediction power for new data, these two metrics become a pair of trade-off metrics for this problem. Thus, the third metric coherence is used to determine the final interval of the optimal . Nevertheless, the range of optimal
can be initially determined from 25 to 35 according to the above two indicators. Furthermore, from the figure, we can find the coherence value of the model with 28 topics standing out compared to others in the initial range from 25 to 35. With the help from the subject matter expert interpretation of the models with
range from 25 to 30, 28 was selected as the optimal
in the following analysis. Finally, the following analysis is based on the model with 28 topics.
From the output of the well-built LDA model, 28 topics were clustered with specific words selected from the linac maintenance log dataset. However, the explicit concept and topic meanings are not generated accordingly.3 A post-analysis to identify the core information of the topics based on the top-ranked words is required. With this topic interpretation procedure, the specific failure modes and related components or subsystems of linac can be identified and summarized.
The most straightforward approach to find what a topic represents would be to rank words by frequency on the topic and find the common narratives among those words. Thus, the underlying failure mode and subsystem can be found. However, it would cause a problem that some words which have a high overall frequency across the corpus would show up on many topics. These words may cover words that have a relatively lower probability while contributing a lot in interpreting topics. Therefore, another statistical metric called lift was introduced to rank the top words within topics. The lift is defined as ‘‘the probability of word occurrence conditional on topic divided by the probability of word occurrence across the corpus.’’ This metric will highlight words that have a high probability within a topic locally than those across a corpus.
To use the frequency and the lift metric more flexible, the visualization and analysis package PyLDAvis20 was used to sort the top-ranked words. It introduces a parameterTake topic 3 as an example to demonstrate the process of determining the meaning of a specific topic. When we set , the words will be ranked more by their occurrence within the topic. As shown in Figure 3, the top four words are ‘‘mlc, leaf, motor, stuck,’’ which clearly point to the failure on the MLC and the failure mode is ‘‘mlc leaf stuck.” Then, we set
, the words are ranked much more by their overall probability over the corpus. From Figure 3, the weight of the word ‘‘replaced’’ was increased and it indicates the repair action ‘‘mlc leaf motor replaced.” Therefore, it is evident that we can find the failure modes and corresponding repair action by setting the
to 0.2 and 0.8, respectively. It should be noted that the words in a topic are not purely related to one failure mode or one component. What we are seeking is to find out the dominant words in a topic and relate these words to specific failure modes.
Top-ranked words within topic 3 and corresponding failure mode
All 28 topics were examined by the same process to identify dominant failure modes in each topic. However, not all topics are related to specific failure modes. Some topics are generated with words describing the general maintenance work. For example, the top-ranked words in topic 6 are ‘‘pm day routine carried maintenance,” which point to the routine maintenance records. Under this topic, the co-occurred verbs and nouns are ‘‘cleaned, checked” and ‘‘rim, iview, and processor,” respectively. These words indicated that the main work in routine maintenance is about cleaning and lubrication. Therefore, we categorized topics with identified failure modes into related subsystems according to Wroe's paper.1 Furthermore, the LDA model also gave the overall proportion of words contained in each topic in the corpus. Given the assumption that topics are dominated by few top-ranked words, the proportion of the topic reflects the frequency of the failure modes reflected in the topic. Table 3 displays the topics with top-ranked words and identified failure modes. It is worth noting that topic 14 appeared in two subsystems as it shows up two different failure modes. A possible explanation would be these two failure modes co-occurred many times in service. The identified topics and related failure modes are summarized below.
TABLE 3. Keywords within topics and identified failure modes of linacs Subsystem Topic (words frequency) Keywords within topic Identified failure modes Topic 2 (6.5%) Fuse supply power tube generator relay rectifier blown outage black bridge magnetron replaced checked Fuse blown; power outage; bridge rectifier, magnetron, modulator failure Electrical Topic 3 (6.4%) MLC leaf motor stuck moving emitter reflector replaced pushing initialized cleaned MLC motor failure Topic 21 (2.2%) Reflector reference line detector lost locked verified adjusted calibration reset MLC leaf lost reflector; reflector out of calibration Control Topic 4 (5.9%) Console physics keyswitch sound timing frame board replaced check Keyswitch replaced; buzzing sound from board Topic 8 (4.4%) PCB controller carriage program fitting driver change clear tightened Replaced controller PCB; tightened up connection
Comments (0)