Cortical ensemble activity discriminates auditory attentional states

Selective attention modulates sensory cortical activity. It remains unclear how auditory cortical activity represents stimuli that differ behaviorally. We designed a cross-modality task in which mice made decisions to obtain rewards based on attended visual or auditory stimuli. We recorded auditory cortical activity in behaving mice attending to, ignoring, or passively hearing auditory stimuli. Engaging in the task bidirectionally modulates neuronal responses to the auditory stimuli in both the attended and ignored conditions compared to passive hearing. Neuronal ensemble activity in response to stimuli under attended, ignored and passive conditions are readily distinguishable. Furthermore, ensemble activity under attended and ignored conditions are in closer states compared to passive condition, and they share a component of attentional modulation which drives them to the same direction in the population activity space. Our findings suggest that the ignored condition is very different from the passive condition, and the auditory cortical sensory processing under ignored, attended and passive conditions are modulated differently.


Introduction
Sensory perception is highly modulated by attention [20]. At different modes and levels of engagement in behavioral tasks, attention may modulate sensory cortical processing, including spontaneous activity [1,4,10,22], stimulus-evoked activity [5,8,9,19], and population dynamics [3,6,29]. Sound representations in the auditory cortex change in response to the activation of neuromodulatory systems that regulate attention [2,11,15,16]. Depending on behavioral contexts, the same stimulus can be a target that requires attention or a distractor that should be ignored. Here, we examine whether auditory cortical neurons, at both single-cell and population levels, respond to stimuli differently when they are targets versus when they are distractors in a cross-modality attention task, and how ensemble neuronal activities differ under these attentional conditions.

Animals
Animal procedures were approved by the Stony Brook University Animal Care and Use Committee and carried out in accordance with the National Institutes of Health standards. Experiments were conducted using male C57BL/6 mice (Charles River Laboratories). Mice were housed with free access to food, but water was restricted after the initiation of behavioral training. On training days, water was available during task performance (2.5 μL for each correct trial); on non-training days, water bottles were provided to the mice for at least 1 h per day.

Behavior
Experiments were conducted in a dark, single-walled, sound-attenuating training chamber (22 cm × 15 cm). The chamber contained three nose pokes, each of which consisted of an infrared LED/infrared phototransistor pair connected to the Bpod system (Sanworks, LLC) for response detection. The activation of a central nose poke was required for trial initiation. One speaker embedded in the wall delivered auditory cues or distractors. The sound intensities at three different positions in the behavior chamber was calibrated monthly, and they are within ±1 dB range of differences. Two white LEDs were mounted in two reward nose pokes for the "visual task." Water rewards were controlled by the Bpod system and delivered from the wall-mounted nose pokes.
Freely moving mice were trained to perform a set of 2AFC tasks, as previously described [30,33]. Each trial was initiated when the mouse inserted its nose into the center port of a three-port operant chamber. After a delay period (200-300 ms; uniform distribution), a 100-ms stimulus would be present, indicating which nose poke (left or right) would be rewarded with water. Mice then selected the left or right goal port based on sensory stimuli. Note that the mouse must stay in the center port until the 100-ms stimulus finished, which prevents the influence of movement to the measurement of auditory response. The mouse will not get reward if the mouse withdraws from the center port before the end of the auditory stimulus, and we excluded these trials from our analysis. Every mouse was trained to perform in a sound block and light block in randomized sequences.
Auditory stimuli consisted of a pseudorandom, 100-ms stream of 30-ms pure overlapping tones presented at 200 Hz. Eighteen possible tone frequencies were logarithmically spaced from 5 to 40 kHz. For each trial, either the low stimulus (5 to 10 kHz) or the high stimulus (20 to 40 kHz) was selected as the target, and the mice were trained to report low or high by choosing the correct port for the water reward. Correct responses were rewarded with water (2.5 μL for each correct trial), and error trials were punished with a 4-s time out. The sound intensity was calibrated to 60 dB.
It takes about 4 weeks to train mice to perform the task, 2-3 h for one session per day. We first train mice to learn the task in which only visual cues were presented. One week later, most mice can reach above 90% accuracy of performance rate. We then train the mice in the task sound block ('attending to sound'). Most mice can learn this task version within a week. Then we train the mouse to do the task light block ('ignoring sound') which also takes about 1 week. Finally, we train mice to perform the sound block and light block in the same day with a dummy miniscope mounted on their head to make sure they can switch the task during the recording day.
Calcium imaging procedure AAV9-calmodulin protein kinase II (CaMKII)-GCaMP6f (University of Pennsylvania Vector Core) was injected into the auditory cortex at the following stereotaxic coordinates: 2.92 mm caudally from bregma, 4.2 mm laterally from midline, and at a 2.25-mm depth from the depth of the bregma. One week after the injection, a prism probe (diameter: 1.0 mm; length: approximately 4.3 mm; pitch: 0.5; numerical aperture: 0.5; Inscopix) was implanted 0.2 mm laterally from the injection site. Three weeks later, a base plate was implanted after checking the calcium signal.
Images were acquired at 20 frames per second using Inscopix nVista. At the beginning of each imaging session, the protective cap was removed from the previously implanted base plate and attached to the microscope. The imaging field of view (maximal size, 1 × 1 mm) was then selected by adjusting the focus. Focal planes were 150-200 μm away from the prism. During recording, the LED output power of the microscope was set at 30% of the maximum. The time stamps of the behavior events were exported from the Bpod system to the microscope for synchronization.
All task and passive sessions were recorded on the same day to prevent a population shift across days. The camera was attached to the animals' head during the entire recording period with all recoding blocks. Recording began 15 min after the start of the session, which allowed the mice to switch strategies from those of previous tasks. Each task session was recorded for 10-15 min. The mice had a 1-h gap with free access to food between the recording sessions to recover the motivation of the animals. The 1-h gap can also prevent the photobleaching of calcium signals after continuous imaging for a long period. We left the camera attached to the animals during the gaps, preventing the potential shift of the field of views between the blocks. After the task sessions, the mice had a 1-h gap with free access to water and food to let them lose motivation before recording the passive sound response. The passive sound response was recorded for 5 min in the same chamber as the task session. The 100-ms cloud of tones was presented every 3-4 s during the passive session.
The acquired images were spatially downsampled by a factor of 2 to compress the size of the video. We concatenate the calcium imaging videos from different blocks of the same recording session and perform motion correction [26] for the video using Mosaic (version 1.2; Inscopix, Palo Alto, CA). After motion correction, we trim the video by blocks and extract the spatial and temporal components (Z-scored ΔF/F) of the recorded neurons by an extended constrained non-negative matrix factorization (CNMF-E) algorithm [21,31]; the minimal correlation was set to 0.95 and the minimal peak noise ratio was set to 10 during the initialization step of the CNMF-E. Cell registration was applied to the spatial component, which allowed tracking of the same neuron from different sessions based on the spatial correlation and center distance [23].

Data analysis and statistics
Criteria for sound responses: We applied bootstrap (one-sided) to test if a neuron has significant response to the stimulus. The number of bootstrap samples is 10,000. The p-value is defined as the probability of the calcium signal in the tested frame as large as it has been detected if the tested frame is not significantly different from the average intensity of the baseline frame. If a neuron had at least two consecutive frames within 500 ms after sound onset with p-values smaller than 0.01 in the bootstrap analysis of a certain trial type, the neuron was identified as a sound responsive neuron. The baseline was selected between 200 and 500 ms before sound onset in the passive block, and between 200 and 500 ms before trial initiation in the task block. Due to the signal signal-to-noise ratio of one-photo Ca 2+ imaging and the feature of calcium trace extracting algorithm, we focused our single neuron level analysis on the excitatory enhanced responses.
The Modulation Index is defined as the following equition: The value is the peak intensity of the average calcium trace within 0-500 ms after sound onset. When the task state is compared with the passive state, A represents the task state, and B represents the passive state. When the attended state is compared with the ignored state, A represents the ignored state, and B represents the attended state.
An SVM with a linear kernel was used for all decoders. We used the calcium signal intensity from the single frame right after the sound offset to prevent the influence of other behavioral factors such as movement. The same number of trials (49.00 ± 2.61) from different blocks were randomly selected to balance the decoder. Two-thirds of the data were used for training/validation and the remaining one-third of the data were used for testing. The model was regularized with an L1 penalty to prevent overfitting where the regularization parameter was selected by 5-fold cross-validation. Significant decoding accuracies were determined by comparing the accuracy of the real data with the shuffled data in which behavioral data were shuffled relative to each neuronal activity. All quantified decoding was performed in the full dimensional space. To visualize the high dimensional ensemble activity, we performed principal component analysis on population responses across different attentional conditions (20 trials were selected randomly for each state), then single trial data were projected onto the first three principal components.
The modulation vector from the passive to attended states (v PA ) is defined as the normal vector of the SVM decision boundary at that point from the passive to the attended state, and the modulation vector from the passive to ignored states (v PI ) is defined as the normal vector of the decision boundary at that point from the passive to the ignored state. The angle θ between the two modulation vectors is calculated using the following formula: The chance level of θ was calculated from the decoder trained using shuffled data.

Results
Behavioral and recording paradigms for mice attending to or ignoring the same auditory stimuli To determine whether auditory cortical neurons respond differently to the same stimuli under attended or ignored conditions, we designed a cross-modality attention task (Fig. 1a-c). We first trained mice to perform a twoalternative forced choice (2AFC) sensory discrimination task. In brief, a freely moving mouse was placed in a dark, sound-proof chamber. Each trial was self-initiated by the mouse poking its nose into the center port to trigger a sound and/or light stimulus. In the sound block, a stream of pure tones with different frequencies was presented as the cue. The mouse learned to associate the frequency of pure tones (high versus low) with an action (going to the left or right port) for a water reward (Fig. 1a). In the light block, LED lights on top of either left or right port were turned on as a cue, and a stream of pure tones with different frequencies was simultaneously presented as a distractor (i.e., their frequencies were not associated with the reward port). The mouse learned to go to the lit port for the water reward and ignore the auditory distractor (Fig. 1b). Well-trained mice performed with average accuracies of 87.0 ± 1.6% in sound blocks and 91.2 ± 1.5% in light blocks (Fig. 1d  left panel). In light blocks, because tone streams with either low or high frequencies were randomly assigned to each trial, there were concordant trials in which the tone frequency and the reward port indicated by the light had the same association as the sound blocks; discordant trials were those in which the tone frequency and the reward port had the opposite association as the sound blocks. Mice performed with accuracies of 99.4 ± 0.3% in concordant trials and 85.6 ± 2.2% in discordant trials (Fig. 1d right panel). To exclude the contribution of sound information in the light block, all the following light block analysis only includes the discordant trials. Together the behavioral results showed that mice learned to attend to the auditory targets in sound blocks and ignore the auditory distractors in light blocks (Fig. 1e).
We next recorded neuronal activity of the primary auditory cortex from well-trained mice using in vivo Ca 2+ imaging. We expressed GCaMP6f, an ultrasensitive Ca 2+ sensor protein [7], in the primary auditory cortex by the stereotaxic injection of adeno-associated virus (AAV) and then implanted a prism lens above the injection site for Ca 2+ imaging using miniaturized fluorescence microscopy, as described previously [12,24] (Fig. 2a & b). GCaMP6f is controlled by the CaMKII promotor; therefore, we monitored excitatory neurons in the primary auditory cortex. Four weeks after viral infection, we imaged Ca 2+ activity from these mice when tasks were performed in sound blocks and light blocks, or when the mice passively listened to auditory stimuli (example video in Additional file 1). The Ca 2+ signals from one recording session are shown in Fig. 2c-e. We detected 3379 neurons from 5 mice in 12 recording sessions. Among the detected neurons, 248 showed robust responses to the tone-cloud stimuli in the passive block (bootstrap, p < 0.01). These neurons are referred to as stimulus-responsive neurons below for cross-block analysis. We identified 155 stimulus-responsive neurons responding to the tonecloud stimuli in sound block and 130 stimulus-responsive neurons respond to the tone-cloud stimuli in light block. The smaller numbers of identified responsive neurons in task blocks compared to passive block are consistent with the previous finding that task engagement suppresses overall responses in the auditory cortex [19].

Population activity auditory cortical neurons differentiates different attentional states
To study the attentional modulation of neuronal activity from individual stimulus-responsive neurons, we compared the peak intensity of the average calcium trace of each neuron in a 0-500-ms time window from the onset of sound in three contexts: in the sound block when mice attended to the auditory stimuli, in the light block when mice ignored the auditory stimuli, and in a passive session when mice Fig. 1 Behavioral paradigm for studying auditory cortical activity at different attentional states. a Illustration of the behavioral task in the sound block, in which auditory stimuli were associated with rewards. b Illustration of the behavioral task in the light block, in which visual (but not auditory) stimuli were associated with rewards. c Passive block in which mice passively listened to an auditory stimulus. d Mice performance in sound block and light block. Left panel: correct response rates from all completed trials in sound blocks and light blocks (n = 5 mice; 12 sessions for each block). Right panel: correct response rates of auditory-visual concordant trials and auditory-visual discordant trials in light blocks. (n = 5 mice; 12 sessions) p = 3.96 × 10 − 5 , paired-sample t test. Error bars: mean ± SEM. e Task performance of mice (n = 5 mice, 12 sessions) in sound blocks (0.87 ± 0.02) and light blocks (only discordant trials are included: 0.86 ± 0.02); p = 0.64, paired-sample t test; data are presented as the mean ± SEM passively heard the stimuli (Fig. 3a & b). To avoid day-to-day variations, we performed comparisons of the three contexts from sessions recorded on the same day. From stimulus-responsive neurons, we observed both enhancement and suppression of evoked responses under attended condition compared to passive condition. The same bidirectional modulation of the stimulus-evoked responses was also observed under ignored condition (Fig. 3a & b). The similarities in the modulation index distribution of attended vs. passive and ignored vs. passive modalities suggest that engaging in the task modulates cortical neuronal activity under both attended and unattended conditions. The auditory cortical neuronal activity also displayed bidirectional differences in evoked responses between attended and ignored conditions (Fig. 3b). However, the averaged individual modulation indexes of these comparisons are around 0 (Fig. 3b insert).
We asked whether cortical ensemble activity in response to targets could be distinguished from those in response to distractors, as well as under passive conditions. To quantify the differences in ensemble activity between attended, ignored, and passive conditions, we employed a support vector machine (SVM) as a decoder to analyze ensemble activity (See Methods). The decoder preserves the structure of the neuronal activity, taking account individual neuronal activities without average; considers the trial-to-trial variations; and can potentially The decoder accuracy reflects how well the ensemble activity can distinguish different conditions (attended vs passive, ignored vs passive, attended vs ignored). The decoder accuracy is significantly above the chance level for all pairings after the onset of sound stimuli, which can be visualized in a dimensionality-reduced space (Fig. 4a  & b). The results of engaged versus passive states (attended vs. passive, ignored vs. passive) indicated that the stimulus-responsive ensemble responds to the same auditory stimuli differently, with or without task engagement. Furthermore, the decoding results of the attended vs. ignored conditions showed that the stimulusresponsive ensemble differentially responds to the same auditory stimuli, depending on whether they are targets or distractors in the task context.
We further analyzed neuronal ensemble activity by applying the decoder to all recorded neurons. The decoder accuracies of the entire population were significantly higher than the chance level after the onset of sound stimuli (Fig. 4c), indicating that ensemble activity from the entire recorded neuronal population can distinguish attentional states. Interestingly, the decoder performance from the entire population was significantly higher than that from the stimulus-responsive population (Fig. 4d), suggesting that stimulus-nonresponsive neuronal activity also contributes to state separation. We thus performed the same analyses of the stimulus-nonresponsive population. Indeed, decoder accuracies were significantly higher than the chance level after the onset of sound stimuli (Fig. 4e). The fact that stimulus-nonresponsive neurons can distinguish the three attentional states suggests that these neurons are also modulated by task engagement and selective attention.

Neuronal ensemble activity is in closer states between attended and ignored conditions
Decoder accuracies from the three neuronal populations in distinguishing attended (or ignored) from passive conditions were higher than that for attended vs. ignored (Fig. 5a), suggesting that the activity patterns are closer between the attended and ignored sessions than between the attended (or ignored) and passive sessions. We next determined whether selective attention during the performance of a task in the sound and light blocks modulated the ensemble activity in the same direction. Hypothetically, if attention modulates the cortical responses to targets and distractors in opposing ways, the modulation vectors from passive to attended states will have angles of 180°with the one from passive to ignored states (Fig.  5b, left panel), but if the attentional modulation of targets and distractors is in the same direction, the angle will be 0° (Fig. 5b, right panel), and if the attentional modulation of targets and distractors is independent, the angle will be 90° (Fig. 5b, middle panel). Our analysis showed that the angle between the modulation vectors of the stimulus responsive ensemble was 47.32 ± 5.48°, which is significantly different from that of the shuffled data (Fig. 5c, left panel, p = 4.35 × 10 − 4 ). The angles from the stimulus-nonresponsive ensemble and the entire ensemble were 56.78 ± 1.83° (Fig. 5c, middle panel, p = 5.67 × 10 − 8 ) and 54.78 ± 2.08° (Fig. 5c right panel, p = 1.19 × 10 − 7 ), respectively. Furthermore, the decoder trained by data from attended versus passive conditions identifies trials from ignored condition more likely as attended condition (Fig. 5d). These results indicate that the modulations of cortical ensemble activity under attended and ignored conditions share components that drive the population activity in the same direction.

Discussion
Our study showed that the same auditory stimuli elicited different cortical activities when mice were performing the 2AFC task compared to when they were passively listening to the stimuli (Fig. 3a & b). This finding suggests that engaging in the task induces attentional modulation of both attended and unattended sensory cortices. Multisensory spread of attention during modality-specific attention behavior has been reported in human studies [17,32], but the mechanisms responsible remain elusive. Both cholinergic innervation from the basal forebrain and noradrenergic innervation from the locus coeruleus to the neocortex are known to modulate cortical sensory representations in a behavior-dependent manner [13,14,18,27]. Such neural inputs may be potential candidates for the circuitry mechanisms of the multisensory spread of attentional modulation.
Our results demonstrated that auditory cortical neurons respond to the same stimuli differently, depending on whether they are targets or distractors (Figs. 3 & 4). In addition to cross-modality modulation by engaging in behavioral tasks, there is sensory-selective modulation the attended and ignored modalities. It would be interesting to further examine whether there is modulation on tonotopic layout in the auditory cortex. Furthermore, the circuitry mechanisms underlying such modulations remain unclear. Both cholinergic and noradrenergic signals, and the regulatory inputs from both the parietal and prefrontal cortices [25,28], may play essential roles here, requiring further study. , ignored versus attended (purple, p = 1.9 × 10 − 3 , p = 1.50 × 10 − 5 ). Colored circles are from experimental data, and grey circles are from shuffled data. Statistical analyses were performed in a paired-sample t test between experimental and shuffled data (n = 12). All data represent the mean ± SEM. d Comparisons of decoder performance between the stimulusresponsive population and the entire population after stimuli onset. (attended versus passive, red, p = 3.10 × 10 − 4 ; ignored versus passive, blue, p = 7.75 × 10 − 4 ; ignored versus attended, purple, p = 7.20 × 10 − 4 ; paired-sample t test, n = 12). e Decoder performance of the stimulusnonresponsive population in distinguishing two attentional states after the cue onset: attended versus passive (red, p = 4.68 × 10 − 7 ), ignored versus passive (blue, p = 5.08 × 10 − 7 ), ignored versus attended (purple, p = 1.43 × 10 − 5 ). Colored circles are from experimental data, and grey circles are from shuffled data. Statistical analyses were performed in a paired-sample t test between experimental data and shuffled data (n = 12). Data are presented as mean ± SEM Our analysis also showed that the ensemble activity of stimulus-nonresponsive neurons distinguished the attended, ignored, and passive states (Fig. 4e). Both the multisensory spread and modality-specific attentional modulation of these stimulus-nonresponsive neurons may change the local connections of stimulus-responsive neurons, which in turn may modulate the sensory processing that is important for relevant behaviors. Further studies would examine whether the modulation on stimulusnonresponsive neurons is general for all the sensory cortices or even including other cortices.
Our analysis suggests that cortical neuronal ensemble activity is in closer states under attended and ignored conditions when compared to passive conditions, and that the attentional modulation under attended and ignored conditions shares a similar direction. Engaging Attending and ignoring modulate population activity in the auditory cortex in the same direction. a Comparisons of decoder performance between the three attentional states after stimuli onset. Stimulus-responsive population (left); stimulus-nonresponsive population (middle); entire population (right). Statistical analyses were performed in a one-way repeated measures analysis of variance followed by Tukey's test. n.s., not significant. **, p < 0.01; ***, p < 0.001. b Hypothetical models of the directions of attentional modulation using example of two neurons. Black dots: passive states; blue dots: ignored states; red dots: attended states. c Distribution of the angle between the normal vectors of the separate hyperplanes. Stimulus-responsive population (left); stimulus-nonresponsive population (middle); entire population (right). Yellow bars are from experimental data, and grey bars are from shuffled data. d The proportion of ignored trials that classified as attended or passive states by the attended/passive decoder, using Stimulus-responsive population (left); stimulus-nonresponsive population (middle); entire population (right). Statistical analyses were performed in two sample t-test. *, p < 0.05, ***, p < 0.001 in the task and selectively paying attention to an auditory or visual cue may differentially modulate the auditory cortical neurons. Task engagement may contribute to the same directional components in the attentional modulation of both attended and ignored modalities, whereas attending to one sensory modality may independently modulate the attended and ignored modalities.

Additional file
Additional file 1. Calcium imaging in auditory cortex.