Double trouble – identifying rating inconsistencies due to double ratings of the “Show backbone” study

Recruitment and study design

This study analysed a population-based random sample of women and an opportunity sample of female nurses, geriatric nurses and care workers with Heavy lifting and carrying tasks, including patient transfers. Only women are included in the study, as over 80% of care workers in German healthcare and nursing as well as in geriatric care are female (Destatis 2022).

The database is derived from recruited women with two different approaches. All the subjects were aged between 40 and 60 years. Women in this age corridor were recruited, firstly from the general population through a random sample from the residents’ registration offices. Secondly, participants were recruited from care professions, i.e., nurses, geriatric nurses and carers, in roughly equal proportions at two locations in northern Germany: Clinical Centre 1 (located in Berlin) and Clinical Centre 2 (located in Halle/Saale).

All participants, those from the residents’ registration offices as well as the voluntarily registered women from Clinical Centre 1 and Clinical Centre 2, were contacted and asked to complete an initial questionnaire to ensure their suitability for participation in the study. This questionnaire inquired about any previous illnesses or operations on the spine; women who had already been in treatment for spine-related illness were not eligible and were therefore excluded. If suitable, a further questionnaire inquiring about lifestyle and medical history was sent, and a procedure for making an appointment for the MR images was proposed. The questionnaire concerning lifestyle however was not relevant to the study presented here but was used for further investigation concerning lifestyle and disc degeneration.

Data collection

The participants underwent MRI of the cervical and lumbar spine. All the MR images were acquired using Philips Ingenia 3 T scanners. The MRI of each region consisted of sagittally acquired T1 and T2, transversal T2 and a coronary aquired STIR image. All sequences were aquired in 3 mm slice thickness with SCQ voxel size of 0.8 mm and recon voxel size of 0.5 mm.

Each MR image was then assessed independently by two radiologists of the site that had scanned the participant, i.e. without knowledge of the colleague’s assessment. No special prior training had been done as we hoped for a real life picture of diagnostic quality. A report refers to the rating of an MR image conducted in each case, with the associated ordinal number indicating its sequence. The term rater denotes the radiologist performing the assessment. At the lumbar spine, the segments L1/L2 to L5/S1 were investigated; at the cervical spine, the segments C2/C3 to C7/T1 were assessed (Figs. 1, 2, 3, 4 and 5).

Fig. 1figure 1

Close-up MR scan of the lumbar spine showing disc protrusion in L5/S1

Fig. 2figure 2

Close-up MRI scan of the cervical spine showing loss of disc signal and height and extensive spondylophytes in segments C5-7 and C3-4

Fig. 3figure 3

Close-up MR scan of the lumbar spine; the image illustrates spinal canal stenosis in the L4/5 region

Fig. 4figure 4

Close-up MR scan of the lumbar spine; the image illustrates an osteochondrosis at L3/4

Fig. 5figure 5

An axial T2-weighted image of right-sided recessal nerve root compression at L4/5

The following evaluation criteria were assessed:

Disc height (mm).

Disc signal, categorical.

Disc dislocation, categorical.

Herniation (mm).

Nerv root compression, categorical.

Spondylophytes, categorical.

Retrospondylophytes, categorical.

Osteochondrosis, categorical.

Size of spondylophytes and retrospondylophytes (mm).

Spinal stenosis, categorical.

The disc height was measured using the sagittal T1 image. In this image, the slice with the greatest height of the disc was used. The disc was measured in millimeters as the distance between both vertebrae at the center of the disc (Fig. 2).

The disc signal was categorized as follows: “typically layered” (three distinct layers of hyper- and hypointense parts of the disc), “inhomogeneous” (disc layers are not clearly definable) and “black disc” (signal loss of all parts of the disc, Fig. 2; principles shown in Fig. 6).

Fig. 6figure 6

Characteristics of the disc signal. typically layered (a), inhomogeneous (b), and black disc (c)

Disc dislocation was defined as one of four possible outcomes: “none”, “bulging” (broad-based prominence of the disc), “protrusion” (smaller-based protrusion of disc material which still has clear contact to the disc, Fig. 1) or “sequestration” (disc material that has lost contact to the disc itself; principles shown in Fig. 7).

We also measured the extent of disc dislocation. To do this, the distance between the furthermost part of the dislocated disc material and the assumed natural margin of the disc was measured in millimeters.

Fig. 7figure 7

Grading of disc dislocation: bulging (a), extrusion (b), and sequestration (c)

Following this, it was assessed whether nerve roots were compromised by possible herniated disc parts. If that was the case, it was categorized if there was only “contact” between the nerve root and the disc material, the nerve root was “dislocated” or if there was clear “compression” of the nerve root (Fig. 5, principles shown in Fig. 8).

Fig. 8figure 8

Grading of nerve root compression: contact (a) and compression (b)

Spondylophytes (bone spurs on the circumference of the vertebral body) were classified as “none”, “isolated” (spondylophytes are present but do not join each other) or “fused” (a situation in which spondylophytes of both vertebral endplates form a continuous conjunction; see Fig. 9).

Fig. 9figure 9

Grading of spondylophytes: none (a), isolated (b), and fused (c)

Retrospondylophytes (bone spurs on the dorsal circumference of the vertebral body) were classified in the same way as spondylophytes.

We then measured the extent of the spondylophytes. For this purpose, the distance between the assumed “natural” margin of the vertebrae and the furthermost tip of the spondylophyte was used. The same procedure was performed for the retrospondylophytes. Both measurements were taken in millimeters. When taking the measurements only spondylophytes with bone-signal were taken into account.

Osteochondrosis was classified according to the Modic scale (Fig. 4, characteristics of Modic I and Modic II shown in Fig. 10) [22].

Fig. 10figure 10

Osteochondrosis: Mixed Modic I/II, T1 (a), Mixed Modic I/II, T2 (b), Modic II, T1 (c), Modic II, T2 (d)

Spinal stenosis was described as either present or absent (Fig. 3, principles shown in Fig. 11). It was considered as present if it could be considered as either a relative or absolute central spinal stenosis.

Fig. 11figure 11

Spinal stenosis: absent (a), relative (b), and absolute stenosis (c) of the spinal canal

To evaluate the difference between two independent raters, we therefore decided to use a double-structured reporting system to estimate how reliable imaging reports on cervical and lumbar spine MR images using these criteria are. In this case, double-structured reporting means that there was a digital template in which the given criteria had to be reported. The template predetermined what imaging signs had to be reported and in what manner, for example, specific measurements or classification systems, the radiologist had to report in.

Two experienced radiology specialists with at least 10 years’ experience in radiology reviewed the images independently. Their assessments were subsequently compared; each center reviewed the studies it performed.

Statistical analysis

For all included participants with double findings of the cervical spine (CS) and lumbar spine (LS), graphical representations of the findings are displayed (report 2 vs. report 1) separated by determination of agreements (concordance/agreement). In addition, the consideration of the matches to be expected by chance is considered with Cohen’s ᴋ measure for the categorical data [23,24,25]; the observation is quantified here taking into account purely random coincidences. The meaning of Cohen’s ᴋ was classified from Landis & Koch in the following way: values below zero show poor agreement, values between 0 and 0.2 are slight, values above 0.2 to 0.4 are fair, values from 0.4 to 0.6 are moderate, values over 0.6 and up to 0.8 are substantial, and values above 0.8, the agreement is almost perfect [26]. Continuous variables are illustrated in scatterplots with the arithmetic mean of the percentage difference (PD) noted in the graphics as a measure of the variability of the two reports; differences of 2 mm for the lumbar spine area and differences of 1 mm for the cervical spine area are indicated by a gray area, delimited by dashed lines; as well, we report the mean difference between both reports as bias with associated standard deviation (SD) and the mean absolute difference (MAD) for a measure of deviations of the absolute size in millimeters. In the following, we present the comparison of the ten parameters mentioned above in the methodology, which are divided according to the scale level as scattergrams or exclusively in tabular form for categorical variables.

The statistical analyses were conducted using SPSS [27] and the statistical software package R [28].

Comments (0)

No login
gif