Image Analysis Using the Fluorescence Imaging of Nuclear Staining (FINS) Algorithm

Development of the FINS Algorithm

The FINS algorithm works by initially computing the nuclear regions and then counting signals within this segment for the protein of interest. To compute the nuclear regions, we use a convex segmentation approach using a fitting term based on a thresholding of the image. It is designed to mimic the manual counting processes currently applied. We then iteratively search these regions in the Ki67 and γH2AX channels based on thresholds calculated from the image data. The thresholding parameters are set to a default setting within the code as calculated from our development data but can be adjusted by the user. In the following, we refer to the image data from the DAPI, Ki67, and γH2AX channels as \(_(x)\), \(_(x)\), and \(_(x)\), respectively, and we normalise them such that the intensities are scaled between 0 and 1. This allows for parameters to be defined in a consistent manner. The code is supplied in the Supplementary Information and can be adapted for additional uses.

Nuclear Region Computation

When manually counting nuclei in the DAPI channel, the task is essentially a thresholding problem: count any segment where the fluorescence intensity is above a certain threshold. However, for the γH2AX and Ki67 counts, we also need to compute the boundaries of the counted DAPI regions. This presents two problems in the context of automating this procedure. The first is how to automatically select the threshold such that the algorithm can perform consistently across different cell types or multiple image types, etc. The second, more challenging issue, is how to account for noise in the image. The process that determines the boundaries of the nuclei in the DAPI channel is crucial as these regions are used to find proteins within the Ki67 and γH2AX channels.

We use a segmentation method, based on a variational approach, to partition the image into two regions: foreground (nuclei) and background. Generally, in the continuous setting, for an image \(z\left(x\right)\in \left[\mathrm\right]\) in the domain \(\Omega \subset ^\), the task is to compute disjoint subregions \(_\) and \(_\), such that \(_\cup _=\Omega\), based on some partitioning criteria on the data \(z\left(x\right)\) [5]. In this setting, we construct a data-fitting term using the optimal threshold \(_\) based on Otsu thresholding [6]. To select this parameter, we divide the histogram of image intensities into two regions. This approach is an automatic clustering method which determines an optimal threshold value to minimise intra-class variance. This has been implemented efficiently in Matlab with the function “multithresh”. We use the convex relaxation approach of Chan et al. [7] and Goldstein and Osher [8], where binary labelling of the foreground and background is determined based on minimising the following energy functional:

$$\underset\}}}}\left\_g\left(x\right)\left|\nabla\;u\left(x\right)\right|\;dx+\;\lambda _f\left(x\right)u\left(x\right)\;dx\right\}$$

(1)

This involves total variation (TV) regularisation weighted by an edge function, \(g(x)\), and some data fidelity term, \(f(x)\). Equation (1) is a formulation that is designed to assign zeros (“background”) and ones (“foreground”) to each part of the image, such that the total value of the terms is minimised. The data term explicitly connects the functionality to the image data, and the TV term promotes smooth interfaces in the solution. Minimisation of this energy with respect to \(u\) is a well-understood problem. One possibility is to relax the binary constraint such that intermediate values of \(u(x)\) are permitted [7, 8]. Given that we are looking to find a regularised version of a thresholding procedure, we define an intensity fitting term in (1) as follows:

where \(_(x)\) is the image data in the DAPI channel. Edges are not particularly well-defined in this context, such that we can set \(g\left(x\right)= 1\). According to these choices, the problem then consists of a two-phase variational segmentation problem that we consider in a conventional manner, relaxing the constraint on \(u\), defined by the following:

$$\underset]}}}\left\_\left|\nabla\;u\left(x\right)\right|\;dx+\;\lambda\;__\left(x\right)u\left(x\right)\;dx\right\}$$

(3)

Here, the difference between (1) and (3) is that \(g\left(x\right)\) and \(f\left(x\right)\) have been defined, and the problem is now convex. This allows a global minimiser to be found. In this case, we use the Split Bregman approach to compute a minimiser [8, 9]. Many alternative methods are applicable in this case, such as the dual formulation or Chambolle and Pock’s algorithm [10,11,12,13]. We have found that the fastest method to obtain a solution is to define the initialisation, \(_\left(x\right)\), as follows, as this is in close proximity to the global minimiser of (3) by definition.

$$u_\left(x\right)=\left\1,\;}\;x\in\;_\left(x\right)<0\\\;0,\;}\;x\in\;_\left(x\right)\ge\;0\end\right\}$$

(4)

However, we note that for fixed \(_(x)\), the solution is independent of initialisation. We fix the value of the fitting parameter, \(\lambda =20\), as we have found it to be appropriate for images of this type. We define the solution as \(^(x)\). Based on the work of Chan et al. [7], Bresson et al. [11], and others, this will be approximately binary, such that any thresholding of the function will be a global minimiser of the original problem. We define the computed foreground region as follows:

$$_=\left\^\left(x\right)>\beta\;\}$$

(5)

We select \(\beta =0.5\) (although other values, such that \(\beta \in (\mathrm)\), would yield a similar result). In the following, we use the binary form of the solution, \(^\), denoted \(_\), to compute the counts of the nuclear proteins. The definition of (5) refers to the areas of the DAPI channel that are considered nuclei.

Nuclear Protein Counts

To simulate the manual counting procedure, we use the region calculated in the previous section, \(_\). This region will provide a space in which we can search for signals from nuclear proteins. However, unlike DAPI, we do not need to compute the regions of positive Ki67 or γH2AX signals. Instead, we only need to count the nuclei with the signal present. We treat the Ki67 and γH2AX channels in an identical way but describe the method here in general terms using Ki67 as our example. For this channel, we refer to the image data as \(_(x)\). Using Otsu’s method, we determine a threshold \(_\) on the entire image. For each disconnected region \(_^\in _\), we determine whether \(_\left(x\right)>_\) for any \(x\in _^\). If so, this nucleus is considered to be positive for Ki67. If not, then it is negative for Ki67 (illustrated in Fig. 3). This is calculated for \(i = 1, \dots , n\), such that the maximum count for Ki67 is \(n\) (i.e., the total number of nuclei calculated by the process of determining the nuclear regions in the DAPI channel). For cases where Otsu’s method does not provide an adequate threshold, we defined a floor on the parameter \(_\) such that \(_= _=0.1\) to enable the image to be processed. This process is repeated for the γH2AX channel (with \(_\left(x\right)\) and \(_\)).

Fig. 3figure 3

Example of the FINS processing. (left) cropped overlay image from set B. (right) corresponding FINS output. Binary regions represent \(_\) indicating five nuclei in this region. Red and green contours indicate Ki67 and γH2AX signals (respectively) which are contained within a nucleus

Validation of the FINS Algorithm

In this section, we present results for the proposed algorithm, FINS, compared with the manual counts of researchers 1–7 (where available). We are interested in two aspects of the results of the proposed algorithm: the reliability of the counts for each channel in comparison with the manual data and the improvement in the time taken to compute a result. The results consist of three distinct datasets, which we call sets A, B, and C, respectively. Set A consists of images 1–10 (human primary endothelial cells; HAoECs), set B consists of images 11–20 (human primary dermal fibroblasts; HDFs), and set C consists of images 21–30 (human primary chondrocytes; HCHs). For each image, we have a manual count from the Ki67, γH2AX, and DAPI channels. Researchers 1–5 provided counts for all three sets, and researchers 6 and 7 provided additional counts for set B. We also have data on the time taken to count each image for set A from researchers 1–4. Counts from the FINS algorithm can be reviewed with the user interface, but for more rigorous comparisons, the counts computed by FINS are not reviewed or adjusted by any researcher.

Count Comparisons

We present a visualisation of the results in Figs. 4, 5, and 6. Each figure displays every manual count for each dataset as well as the result achieved by the proposed algorithm. The results are split into Ki67, γH2AX, and DAPI counts. The count computed by the algorithm is connected by a dashed line to distinguish it from the rest of the data.

Fig. 4figure 4

Ki67 (red), γH2AX (green), and DAPI (blue) count results for set A. Researcher counts are indicated by a filled circle. Algorithm counts are indicated by an empty circle and are joined by a dashed line for clarity

Fig. 5figure 5

Ki67 (red), γH2AX (green), and DAPI (blue) count results for set B. Researcher counts are indicated by a filled circle. Algorithm counts are indicated by an empty circle and are joined by a dashed line for clarity

Fig. 6figure 6

Ki67 (red), γH2AX (green), and DAPI (blue) count results for set C. Researcher counts are indicated by a filled circle. Algorithm counts are indicated by an empty circle and are joined by a dashed line for clarity

Figure 4 concerns set A. It should be noted that the FINS algorithm tends to be at the higher end of the range of the researchers for the DAPI channel, but this appears acceptable and mitigated for in context. For images 1, 5, 8, and 10, the FINS count is higher than the maximum of the researcher counts. For images 1 and 5, this is a small difference, with FINS being 1.79% and 2.17% over the maximum researcher count. However, for image 8 it is 12.7% over, and for image 10 it is 8.7%. For image 8, the researcher counts’ range is 39–55, with FINS computing 62. The algorithm is closer to the counts of researchers 1, 2, 3, and 5 (DAPI = 55, 51, 54, and 51, respectively) than researcher 2 is to the others (DAPI = 39). In this image, there are a lot of borderline cells meaning the researcher counting 39 nuclei is probably using a subjective method, perhaps omitting cells that are at the border of the image. In contrast, FINS is overcounting likely due to some detritus in this particular image. Nuclear size can often vary, meaning that to enable successful counts of these differently sized nuclei, there is an unavoidable risk of some cell detritus being counted. However, the FINS user interface allows for post-analysis inspection to mitigate this issue. For image 10, FINS is closer (DAPI = 50) to the counts of researchers 1, 2, 3, and 5 (DAPI = 46, 44, 45, and 46, respectively) than researcher 2 is (DAPI = 37). In context, this demonstrates that FINS is similar to the manual counts. Ki67 is within the range of the researchers’ counts for 80% of images. For γH2AX cells, three images are below the range of researcher counts: image 1 is 2.5% below, image 2 is 5.71%, and image 4 is 6.06% (image 10 is above the range by 2.17%). These are relatively small differences, but we should note that when combined with the minor over-counting for DAPI this could have implications for the conclusions from the data, i.e., the ratio of cells stained for γH2AX to DAPI may be lower than the true percentage due to the γH2AX undercount and DAPI overcount. If this is an unacceptable margin of error to a researcher, then it is easy to mitigate against by using the user interface’s review window.

Figure 5 shows the results for set B. FINS performs comparably with the manual counts. For Ki67, FINS is within the range of the researchers’ count for all images. For γH2AX, FINS is within the range for 90% of images. The minor trend of over-counting in the DAPI channel in set A does not seem to have been repeated here in set B as FINS is within the researcher count range for 80% of images and is under the mean count seven times and over three times. It should be noted that for this data set, we have an additional two users providing manual counts. The Ki67 and γH2AX numbers are very low for set B which creates a large potential for unreliability (even a small amount of over-counting would be very significant here), but FINS appears to be reliable when the number of cells in a channel is low. For the γH2AX data, there are four images (11, 12, 14, 15) where all users agree that there are no γH2AX stained-cells present, and FINS also computes a zero for these cases.

Set C, shown in Fig. 6, has consistent data. For Ki67 and γH2AX, the FINS results are within the researchers’ count range for 90% of images. As previously discussed, variation between images can arise due to many reasons, such as the image capture settings, so it may be that the minor over-counting trend in the DAPI channel in set A was a result of the nature of the images, as this trend is not observed in Set B or Set C. For DAPI, we observed no trend to over-count (FINS is under the mean count six times, over twice, equal twice), but only 70% are within the researchers’ range. However, it is clear that there is a broad consensus between the algorithm and the users (image 24 even has all agreeing precisely). Expressed as percentages of the mean, the researcher counts show a range of 6.09% for image 21, 9.04%, for image 28, and 2.96% for image 30. For these three cases, the FINS results are only slightly below the lowest count (3, 3, and 1, respectively); FINS computes a count which is 94.7% of the mean count for image 21, 91.9% for image 28, and 97.6% for image 30. On the whole, the algorithm is consistently accurate for each channel.

Counts from each channel are not useful in isolation, so we now investigate the performance of the algorithm in the context of these connections. We have introduced a quantitative measure of count similarity when Ki67 and γH2AX counts are related back to the DAPI count. In Figs. 7 and 8, we plot count data for DAPI versus Ki67 (set B) and γH2AX (set C), respectively. This shows a visual representation of how FINS counts compare against the researcher counts for these datasets. For each image, we computed the centroid of the researchers’ counts and calculated the distance each researcher’s count is from this point. We can then use the maximum of these distances to give a quantitative measure of concordance. The percentage concordance for FINS is then defined as the percentage of instances when the algorithm’s count distance from this centroid is below the maximum of the researchers’ distances. We can calculate a similar metric for each researcher, e.g., percentage concordance for researcher 1 would be based on distances from a centroid computed using counts from researchers 2–5. We should note that this does give FINS an inherent slight advantage with this metric as FINS compares against five counts (whereas the researchers compare against four counts) and the maximum distance (hence percentage concordance also) is likely to increase slightly when more counts are present. Nonetheless, it gives us a measure of the extent to which the algorithm counts are similar to the researchers’ counts. The results are presented in Table 1 and Figs. 9 and 10.

Fig. 7figure 7

DAPI versus Ki67 counts for set B. FINS count is indicated by an asterisk “*”. Filled circles indicate researcher counts. Images 11–20 are indicated by dark blue, dark green, yellow, purple, light green, pink, light blue, orange, red, and black, respectively

Fig. 8figure 8

DAPI versus γH2AX counts for set C. FINS count is indicated by a *. Filled circles indicate researcher counts. Images 21-30 are indicated by dark blue, dark green, yellow, purple, light green, pink, light blue, orange, red and black respectively

Table 1 Percentage concordance comparison between FINS and researcher counts for sets A, B, and CFig. 9figure 9

Distance from the centroid of the Ki67 and DAPI results from other counters. The maximum distance from the centroid of all researchers’ results is indicated in red. The distance from the centroid for the results from FINS is indicated in blue. The distance from the centroid for the results from each individual researcher is shown as grey lines. Images are from sets A–C

Fig. 10figure 10

Distance from the centroid of the γH2AX and DAPI results from other counters. The maximum distance from the centroid of all researchers’ results is indicated in red. The distance from the centroid for the results from FINS is indicated in blue. The distances from the centroid for the results from each individual researcher are shown as grey lines. Images are from sets A–C

The results from Table 1 support the idea that FINS count performance is similar to that of the researchers. A percentage concordance of 86.7 for DAPI versus Ki67 counts is higher than any of the researchers. For DAPI versus γH2AX counts, the percentage concordance is 80, with only researchers 3 and 5 higher at 83.3. Figures 9 and 10 show how these results are calculated for FINS. The blue line here is the distance of the FINS count from the centroid of the researchers’ counts. The maximum distance of the researchers’ data is indicated by the red line (the range is shaded grey). Here, we can visualise the consistency of the count data. It is worth stating that the maximum distance tends to be higher for set B due to having additional researcher data. When FINS is above this maximum distance, this is not to say that it is unacceptable in these cases. In the same way, a researcher count being furthest away from the centroid does not invalidate their data, FINS occasionally (as shown by the percentage concordance) being furthest away is not a negative. In fact, with the algorithm designed to function like a researcher count, we would expect it to be sometimes furthest away from the centroid.

We now include a visualisation of the results that highlight the relationship between DAPI and Ki67 counts; Fig. 7 shows the results for set B. It allows us to observe to what extent the results of the algorithm “cluster” with the manual counts using an important biological metric. Figure 7 shows that images 12, 13, 14, and 15 have data that is quite spread (although it should be noted that there are seven researchers’ counts for this data). This demonstrates the inherent variability in the data, highlighting the challenge of counting the cells consistently. We can see that the algorithm performance in these cases is reasonable based on the distances in Fig. 9, where FINS is below the maximum distance. For the remaining images, where the data are more clustered, the results are very strong. Over image sets A, B, and C, there are four cases where FINS is above the maximum distance. For images 8 and 21, they are over by less than 1. Images 21 and 28 are 2.85 and 5.95, respectively, over the maximum distance. For image 28, the researchers’ range for DAPI is 64–70, and for Ki67 is 2–10. The FINS count here is DAPI = 61 and Ki67 = 16. This particular image has 11 cells that are cropped by the border of the image. It is subjective as to whether or not to count them, and in this case, it appears most researchers have counted them. However, FINS has not counted them as the signal from them is not likely to have exceeded the threshold in the same way as the other cells in the image. This is arguably preferable because if a full nucleus cannot be observed and no nuclear protein staining is seen, we cannot be sure if the nuclear protein is stained but is located outside of the image border. Whilst the undercount of DAPI is acceptable, its appearance in tandem with the Ki67 overcount could be of concern. However, the Ki67 overcount appears to be explicable as the result of high levels of autofluorescence. High levels of background autofluorescence are discussed previously, and whilst FINS may not be able to deal with such irregularity in an image as well as a researcher could, it is more likely to be consistent when faced with a difficult image, and the risk of this is mitigated by the ability to review data with the user interface.

Similar results were also obtained for DAPI versus γH2AX counts for set C, shown in Fig. 8. Again, this is useful because it allows us to assess the similarity of the counts between manual acquisition and the proposed algorithm. Here, we can see that the FINS count clusters very tightly with the manual counts for the majority of images, confirmed by the distances in Fig. 10. These figures allow us to see clearly that the algorithm is very consistent for DAPI in these cases. Overall, the trend is very similar to the DAPI versus Ki67 comparison above. It is difficult to distinguish the algorithm’s performance from the manual data. There are anomalous cases, but this is true for the researchers’ data as well. For DAPI versus γH2AX (Fig. 8), there are six cases where FINS is above the maximum distance. Images 2, 8, 21, 28, and 30 are less than 1 over the maximum distance, indicating a trivial difference. However, image 26 is 7.79 over the limit. We can see from Fig. 10 that there is a near-consensus in the DAPI count (the FINS count is within the range of researchers’ counts), indicating that the error is primarily in the γH2AX count which is 54 for this image (the range of the researchers here is 11–46). However, FINS is closer to the counts of researchers 1, 3, 4, and 5 (mean count = 33) than researcher 2. This particular image has high levels of background autofluorescence meaning that it has inherently more subjectivity over which cell can be identified as being stained for γH2AX. This causes the wide variation in user counts but arguably proves the need for the algorithm: by its nature, it is more likely to behave consistently than researchers when faced with difficult images. Overall, these visualisations provide further support for the reliability of FINS.

Time Comparisons

We can see from Table 2 that the mean time per image for the algorithm is 1.06 s. Figure 11 shows this in comparison to researchers 1–4, with a split axis used because of the scales involved. It shows the variability of the time taken for each user and emphasises the difference between the algorithm and counting cells manually. These results are very positive in the sense that the algorithm takes approximately 1% of the time a manual process takes. Coupled with the accuracy performance discussed above, this is a potentially transformative development. Batches of images that would take hours to process now take minutes to get a result automatically.

Table 2 Summary of data for set AFig. 11figure 11

Time in seconds taken to analyse ten images from set A. The red dashed line indicates a break in the y-axis. Researchers 1–4 are indicated by grey, dark blue, light blue, and pink lines, respectively. The FINS algorithm is in green

Comments (0)

No login
gif