Optimal Scale-Invariant Wavelet Representation and Filtering of Human Otoacoustic Emissions

DPOAE Representation and Filtering

The most studied DPOAE is associated with the cubic intermodulation DP generated at the frequency fDP = 2f1 − f2 by two nearby frequencies f1 and f2. These signals are the linear superposition of two backward signals, the backward wave generated in the “overlap” region by nonlinear distortion of the two primary waves at frequencies f1 and f2, and the fraction of the forward intracochlear distortion product (IDP) wave that is amplified and reflected by roughness near the fDP resonant place. In the conventional frequency domain representation, the DPOAE response is characterized by oscillations of amplitude and phase measured across a wide frequency range, associated with the linear superposition of these two components of different group delays. These characteristic response patterns are named DPOAE “fine structure” [48, 49]. Although the name suggests the presence of some additional resonant mechanism producing the observed spectral peaks, it was soon recognized that the fine-structure pattern is just due to alternate constructive-destructive interference between two components whose phase difference is a monotonic function of frequency. Indeed, the superposition of two (almost) constant-amplitude spectra with different phase gradient delays yields complex spectra with oscillating amplitude and phase.

The DPOAE fine structure was also considered a promising diagnostic tool, because its presence/absence seemed correlated with the peripheral hearing functionality [50]. Classification of the shape of different fine-structure peak shapes was undertaken, as well as the measurement of characteristic observable parameters, like the amplitude and the pseudo-period of the oscillations [51]. A DPOAE complex spectrum with fine structure is a typical example of a signal that is represented in the non-optimal domain; therefore, its main physical features, represented by a small number of parameters, are hidden, or, better, encoded, in the visualized pattern.

A very similar and familiar example is that of the representation of the superposition of two signals of nearby frequencies f1 and f2 in the time and in the frequency domain. In that case, the well-known beating patterns are visible in the time domain, but, if one is interested in the physical mechanisms generating that response, the frequency domain provides the optimal representation, immediately displaying two lines of measurable frequency and bandwidth, two parameters of immediate physical modeling interpretation. In the non-optimal time-domain representation, the time periods corresponding to the carrier frequency (f1 + f2)/2 and to the modulation frequency f2 − f1 can be directly observed, while the “encoded” f1 and f2 must be computed, and from the depth of the modulation, one can compute the encoded amplitude ratio. Choosing the “wrong” representation domain cannot destroy information, as long as linear reversible transformations are used, but the relevant physical information may be made more or less easily accessible and comparable to theoretical predictions. The representation issue is not a trivial one, because most physical phenomena are synthesized by a set of observable quantities that is much smaller than the degrees of freedom of the recorded signal (the time samples of the waveform or the frequency bins of the complex spectrum). This is true both for the experimental data and for the theoretical model simulations. The optimal representation allows an immediate visualization of the values of the relevant observable quantities. Adding noise strengthens this argument, because the patterns in which the physical information is encoded may become more difficult to be recognized in the “wrong” representation domain.

Returning to the DPOAE fine structure, in the frequency-domain representation, the immediately visible modulation amplitude and pseudo-period of the interference pattern, similarly to the beating example, somehow encode the relevant physical information. In the “right” domain, which in this case is the t-f domain, the dependence on frequency and phase gradient of the amplitude of each DPOAE component are immediately visualized in well-separated regions of the time-frequency plane, as shown in Fig. 1 (left panel). The complex spectra of the distortion and reflection components may be effectively unmixed by time-frequency domain filtering (right panel). The DPOAE response shown in Fig. 1 was obtained using linear chirp stimuli of amplitude (L1, L2) = (61, 55) dB and frequencies f1(t) and f2(t), with a constant ratio r = 1.22, such that the resulting DPOAE response at 2f1(t) − f2(t) swept the frequency range (1–5 kHz) at a constant rate 0.8 kHz/s. The response waveform is decomposed into 50% overlapping windowed frames of 50 ms duration, which are Fourier analyzed to get the amplitude and phase of the current DPOAE frequency component. The speed of the chirp is optimized to match the frequency resolution of the individual spectrum (20 Hz) to the frequency interval between nearby frames. With this Fourier analysis method, either linear or logarithmic chirps may be used, whereas, in the case of the least square fit (LSF) analysis method [52], logarithmic chirps, respecting the scaling symmetry, would be preferable. The quasi-hyperbolic cutoff solid lines are described by: \(\tau \left(f\right)=__^\), where the coefficients t0 and b are positive constants (typically t0 = 10–15 ms and b = 0.7–1) derived from measurements of the SFOAE and TEOAE phase gradient delay [18], and the coefficients ci = (− 0.5, 0.5, 1.5) are used to select the nonlinear distortion component (between c1 and c2), the first reflection component (between c2 and c3) and the multiple intracochlear reflection components (above c3). In Figs. 1 and 2, t0 = 12 ms and b = 0.8 were used. A single filtering operation allows selecting a specific source across a wide frequency interval. Note that a value of b slightly lower than unity is generally necessary, reflecting the slow scaling symmetry breaking associated with the dependence of tuning on frequency.

Fig. 1figure 1

Left: wavelet time-frequency representation of the DPOAE response of a 60-year-old subject. The single-reflection and distortion components are clearly recognizable within the hyperbolic curves, while longer-delay multiple reflections components are visible in the upper part of the plot. Symmetric negative-delay components are also visible in the low-frequency range. A compressively nonlinear intensity map is used to enhance the low amplitude details of the distribution. Right: original DPOAE spectrum and noise and filtered distortion and reflection components. Note the lower noise floor after filtering. The audiometric hearing level (in negative dB HL) is also shown

Fig. 2figure 2

Frequency-domain (left) and wavelet time-frequency representation (right) of a weak DPOAE response. The zero-latency distortion component is clearly visible in the wavelet representation, and in the filtered distortion spectrum (red line), despite the low SNR of the total DPOAE over the whole frequency range

As anticipated in the introduction, in the intermodulation distortion product OAE (DPOAE) case, a constant ratio between the frequencies of the stimuli is necessary to preserve the scaling symmetry of the experiment, and to be able to predict the frequency dependence of the group delay for the two DPOAE components. Indeed, as the generation region of the 2f1-f2 intracochlear distortion product (IDP) is near x(f2), it moves in a scaling symmetric way with the frequency 2f1-f2 only if the ratio f2/f1 is kept constant, because in this case also the ratio fDP/f2 is constant. On the other hand, a paradigm in which the generation place is fixed, as the f2-fixed paradigm, also permits to some extent to predict the expected frequency dependence of the phase-gradient delay for a scale-invariant cochlea [31, 53].

The optimal representation of the different OAE components provided by the wavelet transform permits effective filtering [54], capable of unmixing OAE components of different physiological meaning and dramatically improving the signal-to-noise ratio (SNR). Providing a visual representation in which these components are separated allows one to optimize both analysis and filtering. Indeed, the optimal compromise between frequency and time resolution, which depends on a single free parameter (the relative bandwidth Δf/f of the mother wavelet spectrum), can be easily found by visually inspecting the t-f plots. The same applies to the choice of the width of the hyperbolic filtering regions that allow one to select OAEs associated with a specific generation mechanism and/or relative displacement (with respect to the resonant place), which may also be found using adaptive algorithms [18]. In small mammals, the shorter OAE delay, measured in the number of cycles to use a scaling unit (or the broader relative bandwidth Δf/f) may make it difficult to design an effective filter procedure. Emissions from different mechanisms may overlap along the delay dimension, but, as a general rule, the mother wavelet relative bandwidth should be increased to match that of the typical animal response, for improved source unmixing. To some extent, the same filtering purpose may also be achieved without time-frequency representation and analysis, using either IFT time-domain filtering techniques [55] with variable delay windows in different frequency subranges, and/or least square fit (LSF) procedures applied to swept-tone OAE acquisition [52, 56], with different chirp rates and windows length used to optimally select different delay components. The wavelet technique has the advantage, with respect to the IFT option, of performing a single filtering operation over the whole frequency range, and, with respect to the LSF option, of providing a visual representation in the t-f plane of how the filtering regions actually match, or do not match, the source distribution.

The optimal scale-invariant wavelet representation of OAEs also allows one to visually appreciate details of the OAE response that are not visible in the frequency domain and may be embedded in noise in the time and frequency domains. This is shown in Fig. 2, where the weak DPOAE response of a subject affected by Parkinson’s disease is shown in the frequency and time-frequency domain. Although in the frequency domain, the SNR is very low over the whole frequency range, the zero-latency component emerges from the homogeneous noise background in the time-frequency representation. Such data would be typically rejected by a standard statistical analysis based on the SNR, either global or averaged over frequency bands. Obviously, the fraction of rejected data should be minimized, particularly when, as in this case, each session measurement may represent a stress factor for a sensitive patient. In such patients, the usual way of improving the SNR by extending the integration time may be impracticable, because of either excessive stress or lack of stationarity of the response.

Recently, a DPOAE generation mechanism has been proposed [57] involving reflection by roughness associated with the spatial modulation of the strength of the cochlear amplifier. In such a case, a peculiar time-frequency signature of this component had been theoretically predicted [58], i.e., the occurrence of symmetric positive and negative delay components in the delay-frequency plane. The time-frequency representation is optimal to verify this hypothesis and time-frequency filtering may be an effective way to measure the relative weight of such components.

SFOAE and TEOAE Representation and Filtering

To some extent, similar considerations also apply to the SFOAE and TEOAE representations. These OAEs are mostly generated by a single mechanism, coherent reflection, but a fine structure is observed also in the SFOAE and TEOAE spectra. In this case, for each frequency, there is interference among components generated by the same mechanism in different cochlear regions, and among components associated with multiple intracochlear reflections. These interference phenomena may also add extra complexity to the DPOAE fine structure, which is mostly due to the interference between the components coming from the two different mechanisms. For SFOAEs and TEOAEs, the source distribution is strongly inhomogeneous, and the sources are not point-like. Indeed, for each frequency, the coherent reflection filtering (CRF) mechanism [31, 59] selects different regions of coherent reflection localized within the spatial width of the response peak, yielding reflected wave packets with different group delays and bandwidths. As a consequence, the time-frequency representation of these OAE responses consists of a collection of spots, of given delay and frequency widths [60]. Neglecting intracochlear reflections, different spots at the same frequency correspond to differently spatially localized sources. The possibility of separating in the t-f domain two such sources at the same frequency is related to the intrinsic spatial width of the sources, to the steepness of the delay-position function and to the time resolution of the t-f analysis. Indeed, the delay width of a source is related to its spatial width and to the variation of the slope of the BM response phase (group delay) within that region. Near the peak of the BM response, this variation may be large; therefore, regions of small spatial width and close to each other may appear as separate spots of different delay in the t-f plane, whereas more basal sources, generated within regions where the group delay is slowly varying, may appear as a single spot localized along the time axis even if they are associated with a coherent source extended over a wider spatial region. Optimizing the frequency-time resolution balance of the mother wavelet could help, but the overlap between the intrinsic delay widths of the basal spots cannot be overcome. These concepts were formalized in [60] by defining a local reflectivity source.

Localization of the OAE source of a given frequency must be interpreted in a scale-invariant way, as relative to the peak of the BM response of that frequency, taking into account the finite-width spot-like nature of the contributions to the OAE response from coherent reflection. The t-f representation shows the distribution of the sources in this scale-invariant way, with sources sharing the same spatial shift relative to the best place of that frequency distributed along hyperbolic lines in the t-f plane (see Fig. 3). In this representation, the distribution of the OAE sources along the scale-invariant coordinates (same physics at different frequencies moving along the hyperbolas, different physical phenomena moving orthogonal to them) is immediately visible.

Fig. 3figure 3

Time-frequency representation of an SFOAE response (from [18]), with hyperbolic curves delimiting across frequency a set of scaling equivalent regions with the same physics (e.g., nonlinear distortion below \(\tau_\), absent in the SFOAE case, single reflection from a region slightly basal to the peak (defined here as short-latency, or SL) between \(\tau\) and \(\tau_\), single reflection from the peak (long-latency, or LL) between \(\tau_\) and \(\tau_=2\tau_\), double reflection (double-latency, or DL) between \(\tau_\) and \(\tau_=2\tau_\), multiple reflections above \(\tau_\)). With respect to Fig. 1, the dotted cutoff line is added, which separates first reflection component coming from different regions (SL and LL) within the peak region, which are separated along the delay dimension because in that region the phase gradient delay is rapidly varying with the source position

Sisto et al. [18] applied the wavelet filtering procedure described in Fig. 3 to TEOAE and SFOAE spectra from the same ear, showing that the first-reflection components coming from more basal sub-regions (SL, below the dashed line in Fig. 3) have systematically steeper I/O functions than the LL components, consistently with a less compressive behavior of the BM response basal to the resonant peak. Double reflections show a strongly compressive behavior, because the cochlear nonlinear gain is experienced twice during their TW path.

In time-frequency representations, SSOAEs emerge from the hyperbolic SFOAE (or TEOAE) pattern as an amplitude-modulated vertical bright line, with an envelope of decaying amplitude in the first tens of ms, eventually reaching a stationary amplitude for long delays, as shown in [18]. In Fig. 4, such a behavior is shown using a log scale for the frequency axis, for two different choices of the trade-off between time and frequency resolution of the wavelet analysis. The modulation period corresponds to the round-trip delay between each intra-cochlear reflection, confirming the interpretation [61] of SSOAEs as due to the concurrence of coherent (in a standing-wave sense) intracochlear reflections and round-trip gain larger than unity (in the long-delay, low-amplitude limit). On the other hand, in the case of narrow-band SOAEs, other t-f methods may be more effective than the wavelet transform, as previously mentioned.

Fig. 4figure 4

SSOAE and multiple reflections in two wavelet representations of the same TEOAE response to a 60 dB click stimulus. In the right panel, the intrinsic frequency resolution of the analysis in improved by a factor of two, and correspondingly, the time resolution is worsened by the same factor. Logarithmic units are used for the frequency axis to show the invariance across frequencies of the intrinsic relative frequency resolution Δf/f

Another useful feature of the t-f OAE representation is related to the detection of artifacts. Jedrzejczak et al. [62] proposed using the localization of the ringing artifact within a specific region of the t-f plane to filter TEOAEs recorded with a linear acquisition paradigm. The SFOAE residuals yielded by the suppression or compression method [63] may be “contaminated” by artifacts associated with transitory changes of the probe stimulus level effectively reaching the cochlea. If slow chirps lasting several seconds are used for the probe and suppressor stimuli, a fluctuation of the middle ear transmission during one of the probe chirps due, e.g., to swallowing, would cause a spurious zero-delay residual contribution localized along the frequency axis over a frequency interval corresponding to the perturbed chirp fraction. As the probe level is typically much higher than the SFOAE residual, and as the differential SFOAE acquisition paradigms (both compression and suppression) are based on linear differences, even a small fraction of the probe amplitude would give a significant contribution to the average residual. Such spurious contributions could be easily identified in the t-f domain from their null group delay, so coupling a nonlinear acquisition method to time-frequency filtering in the first-reflection hyperbolic band would reject both noise and such artifacts without rejecting the good data coming from the time fractions of the chirp that were not perturbed. Again, the t-f representation and filtering help improve the reliability and the SNR of the OAE data.

OAE Group Delay, Cochlear Tuning, and Stimulus Level

Contrary to expectations based on simplistic active filter interpretations of the OHC amplification, the group delay of OAEs generated by coherent reflection does not vary as rapidly with stimulus level as does the gain of the response [64]. In other words, the inverse proportionality relation between response amplitude and bandwidth that is typical of a resonant passive filter is not observed in the OAE response. Moreover, an accurate time-frequency analysis of the SFOAE and TEOAE response at different stimulus levels shows that elementary components (wave-packets or “spots” in the t-f domain) of almost constant delay are present in the response and that increasing the stimulus level only increases the relative weight of the more basal sources, associated with shorter delay spots [18]. Indeed, CRF components are localized at a specific place, and the relative intensity of the corresponding spots depends on the shape and width of the BM resonant peak. As a consequence, the average delay of the first reflection component (obtained by filtering within a hyperbolic band that excludes multiple delay spots associated with multiple intracochlear reflections) may be used as a stable measure of cochlear tuning [65]. The objective estimate of tuning is an important application of OAE research [3, 64, 66], which is particularly sensitive to noise and to systematic errors related to the interference between sources of different group delay, including the multiple intracochlear reflections that in some cases (when the round-trip gain equal to unity condition is reached for a level above the noise floor) are classified as SSOAEs. The t-f representation and filtering option is generally necessary to improve the quality and stability of OAE-based estimates of tuning [54, 65, 67]. In particular, the delay-frequency function of each OAE component may be more effectively computed as the average of the delay \(\tau\) within the corresponding filtering band, weighted, for each frequency, by the wavelet coefficient WTx(\(\omega , \tau\)) squared. This procedure yields a much more stable estimate of the cochlear delay (and tuning), with respect to taking the derivative of the phase-frequency function.

The fact that group delay is a slowly varying function of the stimulus level has also the important consequence of permitting the same choice at all stimulus levels of the t-f hyperbolic filtering regions used for unmixing distortion components, as well as first- and multiple-reflection components, as shown for SFOAEs and TEOAEs in [18]. The individual variation of the OAE group delay among adult subjects of different ages is also remarkably small [68], allowing one to use the same filtering regions also in cross-section studies involving large populations.

Comments (0)

No login
gif