Dynamic X-ray Microtomography vs. Laser-Doppler Vibrometry: A Comparative Study

Ethical Approval

The study protocol was approved by the local ethical committee of Bern (Kantonale Ethikkommission Bern, KEK-BE 2016–00887) and the local ethical committee of the Paul Scherrer Institute (Ethikkommission Nordwest-und Zentralschweiz, 2017–00805), as well as the Mass General Brigham Institutional Review Board (#2022P001306).

Dynamic Synchrotron-based X-ray Microtomography

Three fresh-frozen human TBs anonymous donors were provided by the Eaton Peabody Laboratories, Mass Eye and Ear, Boston, MA, USA. The donor of TB1 was a 51-year-old white male (right ear), the donor of TB2 was an 89-year-old black female (right ear), and the donor of TB3 was a 58-year-old white male (left ear). They are stored as B-Fresh1, B-Fresh2, and B-Fresh3 in the PSI petabyte archive system (a tape-based long-term storage system at the Swiss National Supercomputing Centre CSCS in Lugano, Switzerland). For simplicity, we have changed the naming of the samples in this article. TB1 corresponds to the raw data of B-Fresh1, TB2 corresponds to the raw data of B-Fresh2, and TB3 corresponds to the raw data of B-Fresh3. For better readability, we will refer to dynamic synchrotron-based-phase contrast X-ray microtomography simply as dynamic microtomography henceforth.

Sample Preparation

The three fresh-frozen TBs were dissected as follows: Laterally, the concha was removed, conserving the bony and cartilaginous external auditory canal. Posteriorly and superiorly, the air cells of the mastoid portion were removed entirely until the tegmen tympani and the antrum. Inferiorly, the soft tissue was removed until the internal carotid artery, the jugular bulb, and the insertion of the Eustachian tube. The middle ear is, therefore, entirely open through its physiological ventilation routes. Medially, the petrous part of the temporal bone was removed until the bony capsule of the labyrinth. The semicircular canals and the internal auditory canal were skeletonized. Finally, the sample included an intact external auditory canal, middle and inner ear, and had a size of approximately 5 × 2 cm. The surrounding temporal bone was reduced to a maximum thickness of 1 mm to minimize the X-ray absorption.

An earplug was sewn into the external auditory canal. During the image acquisition, the specimen was placed in a custom-made cylindrical holder (diameter of 25 mm) and mounted on the rotation stage at the TOMCAT beamline. To prevent the samples from drying out during the acquisition, they were wrapped in neuro-patties soaked in a sterile saline solution, and the top of the holder was sealed with a plastic film (see Fig. 1).

Fig. 1figure 1

Experimental setup at the TOMCAT beamline. The sample is wrapped in saline-soaked neuro-patties to keep it from drying, fixed in a custom-made sample holder, and mounted onto the rotation stage. The top is sealed with a plastic film, showing two tubes coming out (see zoom-in, bottom left of (a)). The black one is the tip of the earplug, which is sewn on one end to the external ear canal and connected over a silicone tube to the subwoofer (a) or to ER3C Insert Earphones from Etymotic coupled to an amplifier (b) on the other hand. The transparent tube with the red tip is inserted into the probe microphone during calibration. The sample is fixed onto the rotation stage. The X-ray beam comes from the left (indicated with a yellow arrow). The HR setup consists of the in-house built GigaFRoST camera [27] and a 4 × magnification high numerical aperture macroscope from Optique Peter [28]

Sound Stimulation and Calibration

Before scanning each sample, we calibrated the sound stimulation with a clinical probe microphone (ER7C, Etymotic Research) by measuring the exact voltage we needed to apply to the auditory canal to reach the desired dB SPL at a particular frequency. We measured at 256 and 512 Hz with 110 dB and 120 dB SPL. We used a sub-woofer with an inverted cone attached for 256 Hz and ER3C Insert Earphones from Etymotic coupled to an amplifier for 512 Hz and connected either of them to a sine wave generator (MeasComp USB daq Module MC1608 USB-1608G SKU: 6069–410- 059). A silicon tube connected the sound stimulation unit to the earplug we sewed to the external auditory canal.

Image Acquisition and Reconstruction

Dynamic synchrotron-based X-ray phase-contrast microtomography was conducted at the TOMCAT beamline (X02DA) within the Swiss Light Source (Paul Scherrer Institute, Switzerland). A multi-scale strategy was implemented to accommodate the dimensions of human TBs. Initially, a low-resolution (LR) setup was employed to capture overview scans of the sample, followed by local high-resolution (HR) scans of the middle ear. An in-house developed Fiji plugin, utilizing the 3D reconstructed LR dataset as input, facilitated the determination of spatial coordinates for regions of interest to be imaged with the HR setup [29]. The LR overview scans covered a field-of-view (FOV) of approximately 29 × 12.5 mm2, using a half-acquisition technique, which entails a 360° rotation rather than the standard 180° in tomography acquisitions. The setup comprised a PCO 5.5 Edge camera coupled with a 1:1 microscope positioned 3 m from the sample, resulting in an effective pixel size of 5.8 µm. Scan parameters were adjusted to minimize radiation exposure, with a 30-ms exposure time and 1000 projections spanning 360°.

The dynamic HR acquisitions were performed with a custom-made in-house fast read-out system consisting of the GigaFRoST camera [27], a LuAg:Ce scintillator with a thickness of 150 µm, and a 4 × magnification high numerical aperture macroscope from Optique Peter [28]. These components were configured at a propagation distance of 250 mm, yielding an effective pixel size of 2.75 µm [28]. The FOV achieved was approximately 11 × 3.3 mm2 using the “half-acquisition” method. LR and HR acquisitions used a polychromatic beam filtered with a 5-mm Sigradur and a 4-mm glass filter. Additional filtration for LR acquisitions included a 15-mm Sigradur and a 75-µm Molybdean filter to minimize sample dose. The resulting average energy for both setups was approximately 24 keV.

Given the assumption of periodic vibration in the middle ear, with a frequency matching that of the sound stimulation, each motion cycle occurs much more rapidly than the time required for a complete set of angular projections in tomography acquisition (which typically entails several thousand images). To accommodate this, dynamic tomograms were constructed by gathering a substantial number of projections across multiple consecutive motion cycles while the rotation stage slowly rotated. A total of 40,000 projections were captured during a single 360° rotation for each scan.

With the maximum FOV in our HR setup (2016 × 1200 pixels), which was fit to the actual beam size, the maximum frame rate of the GigaFRoST camera is at 2 kHz before saturating the data transfer [27]. This corresponded to a minimum exposure period of 0.5 ms between consecutive image acquisitions. Consequently, the exposure period was always maintained above 0.5 ms to prevent saturation of the read-out system while maintaining a consistent FOV across all frequencies. The exposure time, i.e., the effective photon collection duration, was adjusted based on the frequency of sound stimulation. It always stayed within one-tenth of the sound stimulation period to prevent image blurring due to motion. Thus, exposure time decreased with increasing sound stimulation frequency, ranging from 0.3 ms for 256 Hz to 0.19 ms for 512 Hz. Due to the exposure time being shorter than the exposure period, there is a “dark time” between acquisitions where the camera does not collect photons. The total scan time corresponds to the number of projections times the exposure period. Exposure periods were set to 0.5 ms for 256 Hz and 0.7 ms for 512 Hz, resulting in overall scan times of 20 s at 256 Hz and 28 s at 512 Hz to collect the 40,000 projection images. The longer exposure periods for the faster oscillation experiments (512 Hz) are necessary to ensure the oscillations are sampled accurately.

To correct for the X-ray beam inhomogeneities and dark current of the camera, the projections were first dark- and flat-field corrected. The sinograms were then computed for each set of projections and then reconstructed using the filtered back-projection Gridrec algorithm [30] and the Sarepy algorithm for ring removal [31].

Two signals were collected during the image acquisition: the sinusoidal signal (or gating signal) transmitted from the signal generator to the sound unit and the camera exposure signal, giving the exact time of each image acquisition. These two signals allowed us to associate each image with a specific phase of the sine stimulation, corresponding to a specific phase of the vibration of the middle ear. The gating signal period was decomposed into ten different time windows called phases pj, with p0 being the reference phase taken at the ascending zero-crossing point of the sinusoidal curve. A post-gating algorithm was applied to the 40,000 raw projections to sort them into the correct phases and build ten post-gated tomograms of approximately 4000 projections. These 4000 projections were evenly distributed over the full 360° rotation of the sample, so that each post-gated tomogram allowed to reconstruct in 3D each specific phase of the middle ear movement cycle.

Data Analysis

A detailed description of the analysis pipeline is given by Schmeltz and Ivanovic et al. [26]. Note that the pipeline developed to analyze the dynamic synchrotron-based X-ray microtomography data extracts the motion in all three directions. To allow for a more accurate comparison of the two techniques, we adapted the dynamic microtomography pipeline to also extract displacements in only one direction. We tried to match the direction in which the motion was extracted as closely as possible to the direction in which the LDV measurements were taken. For the umbo, this is along the direction perpendicular to the plane of the tympanic annulus. For the stapes, it is perpendicular to the stapes footplate. This allows for a more accurate comparison between the two measurement techniques.

To assess the movement of the ossicular chain in response to sound stimulation, we assumed that the ossicles act as independent rigid bodies. According to this presumption, their three-dimensional motion over time can be characterized by rigid transformations composed of a rotation followed by a translation, i.e., all points within a given ossicle undergo identical transformations. Therefore, analyzing only a subset volume (SV) of an ossicle is sufficient to deduce the transformation of the entire ossicle. As previously mentioned, each motion cycle was divided into ten distinct time intervals anointed phases pj. The intensity-based registration algorithm imregtform from MATLAB was employed to perform a 3D registration of the SV of an ossicle imaged at phase p0 with the corresponding SV imaged at the other phases pj, where j ∈ [1, 9] to estimate the geometric transformation aligning the two phases without the need of segmentation or manual placement of landmarks. It effectively uses all available information by considering the unaltered intensity of every image pixel, thereby enabling sub-voxel registration [32]. Three SVs were manually selected for each ossicle. The transformations of all SVs were then averaged to obtain an average transformation per phase for each ossicle.

After obtaining the mean transformations for all phases pj across the three ossicles, the sinusoidal displacement of any region of interest (ROI) within an ossicle could be determined by calculating the projection of the displacement vector in z-direction applied to that point. To compare to the LDV measurement points, two ROIs (the umbo and the posterior crus of the stapes) were manually chosen using Fiji from the reconstructed data stack captured at phase p0. To assess the precision of displacement computation compared to manual ROI selection, five points were selected around the ROI to compute the standard deviation of displacement estimations (Fig. 2).

Fig. 2figure 2

Direction of displacement extraction. The ossicles are displayed in yellow, surrounded by petrous bone, and the tympanic membrane, both in gold. White arrows indicate the directions of movement extraction from the dynamic microtomography data (in z-direction). Green arrows approximate the direction of the laser beam. As indicated in the figure, for this TB, we measured at the posterior crus of the stapes at an angle of 45°. We can then apply a cosine correction to derive the stapes motion along the z-direction, aligned with the stapes footplate. We cannot approximate the angle properly for the umbo; therefore, the direction difference between the LDV and dynamic microtomography seems to be more extensive

To ensure that the extracted transformations corresponded to vibrations of the stimulated ossicles and not to vibrations of the entire sample in the sample holder, we applied the pipeline to a portion of the petrous bone. These values set the noise limit for our analyses.

Laser-Doppler VibrometerSample Preparation

The identical three specimens that underwent dynamic microtomography were refrozen and returned to Eaton Peabody Laboratories, Mass Eye and Ear, Boston, MA, USA. Further preparation of the temporal bones consisted of opening the facial recess to confirm the normality of the middle ear structures and to gain access to the stapes for laser-Doppler vibrometry (LDV) measurements [33]. A small part of the anterior–superior wall was opened and later replaced by a transparent plastic window to allow LDV measurement of umbo displacement (see Fig. 3).

Fig. 3figure 3

Experimental LDV setup at the Eaton Peabody Laboratories. The sample is fixed on an air-isolation table within a soundproof booth. Part of the ear canal wall is replaced by a transparent plastic window that allows LDV measurement of the umbo. Reflective beads are placed at the area of interest (umbo, posterior crus of the stapes, and petrous bone). In addition, a probe microphone is inserted into the ear canal through a hole in the plastic window to monitor ear canal sound pressure (Pec) levels near the tympanic membrane (TM). LDV measurement of the stapes displacement is performed through the glass-covered opening in the facial recess (see zoom-in, on the bottom right). We connected the sound stimulation to the yellow earplug, similar to the dynamic microtomography measurements. The laser beam (indicated with a yellow arrow) is manually focused on the reflective beads previously placed in the area of interest

Laser Measurements

Small retro-reflective tape pieces (approximately 100 µm × 100 µm × 60 µm thick) were affixed to the lateral surface of the tympanic membrane (TM) at the umbo and to the posterior crus of the stapes [8]. Each sample was securely positioned on an air-isolation table within a soundproof booth. LDV measurements were initially conducted at the umbo and then transitioned to the stapes without altering the setup. Additionally, vibrations of the petrous bone near the oval window were recorded to assess the noise floor and stimulus artifact in the measured displacements; all of the umbo and stapes motion measurements we report are at least 20 dB above the driven vibration of the petrous bone. For sound stimulation, a speaker (Radio Shack) equipped with a plastic tube was tightly sealed to the opening of the ear canal to deliver sound to the external ear. At the same time, a calibrated probe microphone (PCB 377C10) monitored sound pressures in the ear canal (Pec) within a distance of less than 2 mm from the TM surface. The hardware of the stimulus and recording system, as well as its software control, have been detailed previously by Ravicz and Rosowski (2012) [34]. The primary stimulus was a sequence of 50 pure tones with frequencies logarithmically spaced between 200 and 20,000 Hz, during which the stimulus voltage to the loudspeaker remained constant at 0.5 V.

Data Analysis

Fourier transforms of the recorded microphone and LDV time waveforms described the complex (magnitude and phase angle) sinusoidal sound pressure and velocity at the stimulus frequency. The velocities were converted into displacements by dividing the complex velocity by (2 π × frequency × i) and then normalized by the complex sound pressure at the stimulus frequency (Pec). We report the normalized displacement magnitude by the sound pressure for a single stimulus of 0.5 V. In addition, we cosine corrected the displacement for the angle of the measuring beam to account for the expected piston-like stapes movement of the footplate.

Comments (0)

No login
gif