Classifying Neuronal Cell Types Based on Shared Electrophysiological Information from Humans and Mice

Data

The Mouse Neuronal Data contains recordings of whole-cell current clamp from identified fluorescent Cre-positive neurons or nearby Cre-negative neurons in acute brain slices derived from adult mice. To identify cells from mice, transgenic mouse lines harboring fluorescent reporters are used, along with drivers that allow for enrichment of cell classes based on marker genes. On the other hand, Human Neuronal Data is obtained from donated ex vivo brain tissues that are analyzed from neurosurgical and postmortem sources. This data is less abundant than data from mice and is challenging to obtain. It is available thanks to the generosity of tissue donors. The ACTD (Allen Cell Types Database) has electrophysiological recordings from 1920 mice and 413 human cells. Each whole-cell current clamp recording responds to a stimulation recorded at 200 KHz (before 2016) or 50 KHz (after 2016).

The dataset contains four stimulating conditions that evoke AP responses from the neurons. The first type, noise stimulation, comprises noise pulses of square current injections. The second type, ramp stimulation, involves gradually increasing the intensity of square current injections at a rate slower than the neuron’s time constant. The third stimulation type, long square stimulation, employs square pulses of extended duration to induce a response from the neuron. Finally, the fourth stimulation type, short square stimulation, delivers brief square pulses designed to elicit a single AP from the neuron. This focused stimulation allows a simple protocol to produce an AP.

The classification of neurons into their broad types in humans and mice, and the classification of mouse cells into their specific neuronal subclasses, relies on analyzing 41 electrophysiological tabular features, though neuronal morphology-related features are attached to some neurons in the dataset, morphological features were not used for classification. All 41 tabular features are solely based on electrophysiological behavior, this is done so that the method presented can be used in clinical real-time applications, where morphological features are lacking. Nevertheless, the morphology of the neuron is still expressed via its electrophysiological features, since the morphology affects the propagation of the electrophysiological signal in the neuronal branching trees (Ofer et al., 2017). These features are extracted from APs captured within various stimulations, such as AP width and height, and AP threshold, along with features related to AP trains, such as firing speed. Each whole-cell current clamp recording is solely based on membrane potential measurements and responds to a stimulation recorded at 200 KHz (before 2016) or 50 KHz (after 2016). Recordings are performed using a range of stimulus protocols, including short pulses (3 ms current injections used to find the action potential threshold within 10 pA), long steps (1s current injections from -110pA to rheobase +160pA, in 20pA increments), slow ramps (ramp of 25pA per 1 second, terminated after a series of action potentials are acquired), and naturalistic noise (pink noise scaled to three amplitudes, 0.75, 1, and 1.5 times rheobase.) to characterize the intrinsic properties of the neurons. For more information on the electrophysiological description of all features or stimulations used, please refer to the Allen Cell Electrophysiology Overview documentationFootnote 3.

The Allen Institute has identified the dendritic morphology of each neuron by categorizing it as either aspiny, sparsely spiny or spiny. This was done by observing the slides of the neuron’s dendrites under a microscope at 20X or 63X magnification. These different dendritic types can be roughly classified as interneurons (aspiny and sparsely spiny) and pyramidal or spiny stellate neurons (spiny).

Aspiny dendrites are characterized by the absence of spiny protrusions, lack of a pronounced apical dendrite and/or axon that emerged from the soma or dendrite at odd angles, and had extensive local branching. Sparsely spiny dendrites are defined by the presence of infrequent to moderately frequent spiny protrusions (approximately one spine per 10 microns), lack of a pronounced apical dendrite and/or an axon that emerged from the soma or dendrite at odd angles, and had extensive local branching, and/or projected up to layer 1.

Spiny dendrites are defined by the presence of frequent spiny protrusions (approximately one spine per 1-2 microns), an axon that descended perpendicularly down to the white matter with sparse, proximal branching occurring at right angles to the primary axonal branch and/or a pronounced primary, apical dendrite (For Brain Science AI, 2015).

The dataset contains four types of stimulations: (1) noise stimulations that involve square current injections with noise pulses, (2) ramp stimulations that are square current injections with increasing intensity at a rate slower than the neuron’s time constant, (3) long square stimulations that are square pulses of duration allowing the neuron to reach a steady state, and (4) short square stimulations that are brief enough to elicit a single action potential. Below is an example of an electrophysiological response of a neuron to a noise-type stimulation in Fig. 2.

Fig. 2figure 2

Top: Stimulation of noise pulses with square current injections scaled to three amplitudes, 0.75, 1, and 1.5 multiplied by the rheobase (the minimum current required to depolarize a nerve given an infinite duration of stimulation) with a coefficient of variation (CV) equal to 0.2. Bottom: the cell’s response to the stimulation

Neurons can be classified into two categories: GABAergic and Glutamatergic. GABAergic neurons are further divided into four subclasses based on their expressed Cre lines, which are Pvalb (Parvalbumin) positive, Vip (Vasoactive intestinal peptide) positive, Sst (Somatostin) positive, and 5-hydroxytryptamine receptor 3A (Htr3a) positive Vip negative. On the other hand, Glutamatergic neurons are classified based on their laminar locations and the location to which they project their axons, as highlighted in Tremblay et al. (2016).

Using the ACTD, researchers have defined five transcriptomic-electrophysiological subclasses, including four major GABAergic subclasses and one Glutamatergic subclass. The subclasses are specified in Fig. 3, as per Rodríguez-Collado and Rueda (2021); Tremblay et al. (2016).

Fig. 3figure 3

Cre-lines composing the defined subclasses, based on Rodríguez-Collado and Rueda (2021)

After the preprocessing stage, which is visualized in Fig. 4, we are left with 1424 mouse samples and 299 human samples for further analysis. These samples will be used to train, validate, and test our classification models. Among the mouse samples, 700 samples are classified as Glutamatergic and correspond to spiny neurons, while the other 724 samples are classified as GABAergic and pertain to aspiny neurons. In the case of human samples, we have 231 spiny neurons and 68 aspiny neurons available for analysis. The distribution of inhibitory and excitatory cells in humans and mice is illustrated in Fig. 5.

Fig. 4figure 4

Electrophysiological features pipeline. Data is first obtained from the ACTD; then, features are extracted into a tabular format as described in Section 1 of the supplementary material. Similar transgenic lines are merged, and data is split into train, validation, and test. The data is then normalized and classified as described in Fig. 4

Fig. 5figure 5

Distribution of dendrite type in mouse vs. human data

When we examine the Cre-line subclasses present in the GABAergic mouse samples, we see that there are four types. 231 neurons belong to the Pvalb subclass, 199 neurons exhibit Htr3a positivity with Vip negativity, 173 neurons fall into the Sst subclass, and 121 neurons are identified as Vip positive. You can see the graphical representation of this data in Fig. 6.

Fig. 6figure 6

Distribution of dendrite type in mouse vs. human data

The extracted tabular AP features can be found in the supplementary material.

Classification Models

Artificial neural networks (ANN) are a type of machine learning models inspired by biological neural networks. ANNs rely on matrix multiplications followed by nonlinear activation functions to learn complex relations between input and output. ANNs are comprised of artificial neurons that are connected through edges. These edges typically have a weight value that can adjust the strength of the signal at that connection, and the weights are ’learned’ through an optimizer such as Stochastic Gradient Descent (SGD) (Ruder, 2016).

Over the last decade, numerous neural network architectures have been developed for diverse applications (Liu et al., 2017). In this paper, we focus on fully connected neural networks, also referred to as multi-layer perceptron (MLP), or just a ’neural network’ (NN) (Krogh, 2008). We also use a new type of NN designed for tabular data, namely the Locally SParse Interpretable Network - LSPIN (Yang et al., 2022).

We focus on two classification tasks that rely solely on electrophysiological features. In the first task, we use a joint model to classify neurons from humans and mice according to their dendrite type (spiny vs. aspiny). In the second task, we classify neurons from mouse samples into their respective cell classes based on marker genes (multi-label classification). These include Pvalb, Sst, Vip, Htr3a\(+|\)Vip−, and Glutamatergic neurons. For the first task, we introduce a domain adaptation component to handle measurements from humans and mice using a joint model based on mutual information from the two domains. For the second task, we use a NN with a sample-specific feature selection mechanism, namely LSPIN, to reduce model overfitting in low-sample-size data and obtain an interpretable model.

Domain-adaptive Classification

In this section, we aim to design a model that can classify neurons from human samples to neuronal types, yet this is difficult due to the shortage of data samples for human neurons. Human neuronal samples are scarce and difficult to obtain. To overcome this issue, we use both mouse and human samples to establish a shared distribution of similar characteristics from both domains. The underlying assumption is rooted in the biological similarity between mouse and human neurons, both originating from mammalian brain tissues. We effectively classify neuronal samples by utilizing common information within samples from both species. However, conventional ANNs may exhibit sub-optimal performance and underperform in such a scenario due to the domain shift arising from the distinct distributions of mouse and human neuronal data. We overcome this limitation by introducing an adversarial domain adaptation scheme, namely DANN Ganin et al. (2016), designed to mitigate the influence of this domain shift. This scheme aligns the distributions of mouse and human neuronal data, enhancing the model’s ability to classify human neuronal samples accurately.

We consider \(X \in \mathbb ^\) the input space, and \(Y \in \\) the output space, where 0 is an excitatory cell, and 1 is an inhibitory cell. We define S to be the source distribution over \(X \times Y\), and \(D_S\) to be the marginal distribution such that \(S = \_i, y_i)\}_^n \sim D_S\). We define T to be the target distribution over \(X \times Y\), and \(D_T\) to be the marginal distribution such that \(T = \_i, y_i)\}_}^ \sim D_T\). Where n is the number of source samples, and N is the total number of samples. We aim to define a classifier \(\eta : X \rightarrow Y\) to which the target risk function in Eq. (1) is low while maintaining a low source risk:

$$\begin R_(n) = \underset, y\} \sim D_T} (\eta (\varvec) \ne y). \end$$

(1)

Since there may be a shift between \(D_S\) and \(D_T\), training a naive model based on Eq. (1) can be biased towards the more abundant domain \(D_S\). To alleviate such bias, Ganin et al. (2016) introduced a technique called DANN that combines representation learning (i.e., deep feature learning) and unsupervised domain adaptation in an end-to-end training process. DANN jointly optimizes two adversarial losses, minimizing the loss of a label classifier and maximizing the loss of a domain classifier. Training both losses can be considered a form of adversarial neural network regularization. On the one hand, the network needs to classify the data into the correct labels. On the other hand, the predictions made by the network must be based on features that cannot discriminate between the source domain and the target domain. In our setting, mouse cells are considered the source distribution and are more abundant, and the human cells serve as the target distribution.

The prediction loss and domain loss are respectively defined as:

$$\begin L_y^i(\theta _f, \theta _y) = L_y(G_y(G_f(\varvec_i;\theta _f);\theta _y),y_i),\\ L_d^i(\theta _f, \theta _d) = L_d(G_d(G_f(\varvec_i;\theta _f);\theta _d),d_i), \end$$

where \(\theta _f, \theta _y, \theta _d\) are the parameters of the feature extractor, label classifier, and domain classifier, respectively, \(G_f, G_y, G_d\) are the function outputs of the feature extractor, label classifier and domain classifier, respectively, and \(d_i\) is the domain label of sample i as illustrated in Fig. 7.

Overall, training the model consists of optimizing Eq. (2):

$$\begin E(\theta _f, \theta _y, \theta _d) = \frac \sum _^ L_y^i(\theta _f, \theta _y) - \frac \sum _^ L_d^i(\theta _f, \theta _d), \end$$

(2)

by finding the saddle point \(\hat_f, \hat_y, \hat_d\) such that:

$$\begin (\hat_f, \hat_y) = \arg \min _ E(\theta _f, \theta _y, \hat_d), \\ \hat_d = \arg \max _ E(\hat, \hat, \theta _d). \nonumber \end$$

(3)

To optimize over Eq. (3), we can use gradient descent, which relies on the following update rules:

$$\begin \theta _f \leftarrow \theta _f - \mu (\frac - \lambda \frac), \\ \theta _y \leftarrow \theta _y - \mu \frac, \\ \theta _d \leftarrow \theta _d - \mu \lambda \frac. \end$$

Where \(\mu\) is the learning rate.

Using the aforementioned NN architecture, domain adaptation is achieved by forcing the prediction based on features that cannot discriminate between mouse and human samples. Final classification decisions are made using discriminative features that are domain (organism) invariant. We assume that a good representation for cross-domain transfer is one for which an algorithm cannot identify between the two domains (Farahani et al., 2021; Rozner et al., 2023).

Fig. 7figure 7

The architecture of the DANN. During forward propagation (solid-line), input data may come from humans or mice. The feature extractor (gray) with weights \(\theta _f\) outputs to both the domain classifier (red) with weights \(\theta _d\) and label classifier (greed) with weights \(\theta _y\). During backpropagation (dotted line), the domain gradient is multiplied by a negative constant. In contrast, the label gradient remains positive. The model cannot differentiate between human and mouse samples but is nonetheless forced to classify cell types for both domains. When optimized, the feature extractor embeds common information from human and mouse samples such that the domain classifier cannot distinguish samples from the two domains. At that stage, the label classifier predicts the cell types of input samples from the embeddings of common information in human and mouse samples in a domain-invariant manner

Multi-label Classification Using Locally Sparse Networks

Collecting whole-cell current clamp recordings is labor intensive; for instance, the ACTD contains only 1920 mice and 413 human cells. The low number of samples makes it challenging to train an over-parametrized NN while avoiding overfitting. To overcome this limitation, we adopt a recently proposed method for fitting ANN models to low sample size data to address this obstacle. Specifically, the method is designed to deal with the problem of low sample size data for tabular heterogeneous biological data such as whole-cell current clamp recordings of neurons in various brain areas in mice. In this section, we use Locally SParse Interpretable Network - LSPIN (Yang et al., 2022), an intrinsically interpretable model with the benefit of showing the features it relied on during inference. We use LSPIN to predict five distinct neuronal types, four subclasses from GABAergic neurons, and one class of Glutamatergic neurons for a total of five subclasses. We achieve state-of-the-art results using the proposed method, surpassing other machine learning models and works in this field.

LSPIN is a locally sparse neural network in which the local sparsity is learned to identify the subset of the most relevant features for each sample. LSPIN includes two neural networks trained in tandem. The first is the gating network that predicts the sample-specific sparsity patterns, and the second is the prediction network that classifies the neuron type. By forcing the model to select a subset of the most informative features for each sample, we can reduce overfitting in low sample size data. Another benefit of this model is that by predicting the most informative features locally, we obtain an interpretation of the predictions.

Given labeled observations \(\^, y^\}_^N\), where \(\varvec^ \in \mathbb ^D\), and \(x_d^\) represents the dth feature of the ith sample, and \(y^\) is the label of the ith sample. We want to learn a global prediction function \(\varvec\), and a set of parameters \(\\}_^\) such that \(\mu _d^\) depict the behavior of the local stochastic gates \(z_d^ \in [0, 1]\) that sparsify (for each instance i) the set of features that propagate into in the prediction model \(\varvec\). Stochastic gates (Yamada et al., 2020) are continuously relaxed Bernoulli variables highly effective for the sparsification of ANNs. They were previously used for several applications, including feature selection (Shaham et al., 2022; Jana et al., 2021) and sparse Canonical Correlation Analysis (Lindenbaum et al., 2021).

Each stochastic gate (for feature d and sample i) is defined based on the threshold function in Eq. (4):

$$\begin z_d^ = max(0, min(1, 0.5 + \mu _d^ + \epsilon _d^)), \end$$

(4)

where \(\epsilon_d^\sim\mathcal N(0,\sigma^2)\) and \(\sigma\) is fixed to a constant during training, and equals 0 during inference. The sample-specific parameters \(\varvec^ \in \mathbb ^D, i = 1,...,N\) are predicted based on the gating network \(\varvec\) such that \(\varvec^ = \varvec(\varvec^|\varvec)\), where \(\varvec\) are the weights of the gating network. These weights are learned simultaneously with the weights of the prediction network by minimizing the loss in Eq. (5):

$$\begin \mathbb [\mathcal (\varvec(\varvec^ \odot \varvec^), y^) + \mathcal (\varvec^)], \end$$

(5)

where \(\mathcal \) is a desired loss (e.g. cross-entropy). \(\odot\) represents the Hadamard product (element-wise multiplication), and \(\mathcal (\varvec^)\) is a regularizer term defined in Eq. (6):

$$\begin \mathcal (\varvec^) = \lambda _1||\varvec^||_0 + \lambda _2 \sum _j K_||\varvec^ - \varvec^||_2^2, \end$$

(6)

where \(K_ \ge 0\) is a user-defined kernel (e.g., radial basis function). The architecture of the LSPIN model is illustrated in Fig. 8.

Fig. 8figure 8

The architecture of LSPIN. The data \(^ = [x_1^, x_2^,..., x_D^]}_^n\) is fed simultaneously to a gating network \(\varvec\) and to a prediction network \(\varvec\). The gating model learns to sparsify the set of features propagating to the prediction model, leading to sample-specific (local) sparsification. Therefore, it can handle extreme cases of low-sample-size data and lead to interpretable predictions.

Comments (0)

No login
gif