Sparse Coding and Decorrelation in Primary Visual Cortex During Natural Vision
Tóm tắt
Từ khóa
Tài liệu tham khảo
H. B. Barlow in Sensory Communication W. A. Rosenblith Ed. (MIT Press Cambridge MA 1961) pp. 217–234; Neural Comput. 1 295 (1989);
; Vision Res. 23 3311 (1997).
P. Foldiak and M. P. Young in The Handbook of Brain Theory and Neural Networks M. A. Arbib Ed. (MIT Press Cambridge MA 1995) pp. 895–898.
E. P. Simoncelli and O. Schwartz in Advances in Neural Information Processing Systems 11 M. I. Jordan M. J. Kearns S. A. Solla Eds. (MIT Press Cambridge MA 1998).
pp. 153–154.
We generated saccadic eye scan paths by using a model of natural macaque eye movements. We acquired eye movement data from a scleral search coil during free-viewing experiments (8 17). The model chose random saccade directions from a uniform distribution of angles. We chose saccade lengths randomly from a distribution based on a b-spline fit to the measured distribution of free-viewing saccade lengths. The eye velocity versus time profile for each saccade was obtained from a lookup table of b-spline fits to actual velocity/time profiles (as a function of saccade length). We chose fixation durations from a gaussian distribution (mean 350 ms; standard deviation 50 ms).
We extracted image patches from 1280 × 1024 pixel images obtained from a high-resolution commercial photo-CD library (Corel Inc.). Images included nature scenes as well as man-made objects and animals as well as humans. To avoid aliasing artifacts that might result from displaying movies on a monitor with 72-Hz refresh we used an antialiasing algorithm in which each 13.8-ms frame of a movie was constructed by averaging 14 images representing the position of the CRF at about 1-ms resolution.
All animal procedures were approved by the University of California Berkeley Animal Care and Use Committee and conformed to or exceeded all relevant National Institutes of Health and U.S. Department of Agriculture standards. Single neuron recordings were made from two awake behaving macaque monkeys ( Macaca mulatta ) with extracellular electrodes. Additional details about recording and surgical procedures are given in [
]. All data reported here were taken under conditions of excellent single-unit isolation. Eye position was monitored with a scleral search coil and trials were aborted if the eye deviated from fixation by more than 0.35°. Movie duration varied from 5 to 10 s. During recording sessions each movie was divided into 5-s segments; segments were then shown in and around the CRF on successive trials while the animal performed a fixation task for a juice reward. Each trial consisted of a stimulus of a single size with differently sized stimulus conditions randomly interleaved across trials.
A well-established and useful description of how sparsely a neuron responds across stimuli is given by its activity fraction A = (Σ r i / n ) 2 /Σ( r i 2 / n ). For further discussion see [
]. Our sparseness statistic is a convenient rescaling of A that ranges from 0% to 100%: S = (1 − A )/(1 − A min ) = (1 − A )/(1 − 1/ n ).
Throughout this report we measured significance with randomization tests using 1000 random permutations of the relevant data. For further discussion see [B. F. J. Manly Randomization and Monte-Carlo Methods in Biology (Chapman & Hall New York 1991)].
If responses are averaged within a fixation sparseness declines from 41 to 23% 52 to 34% 61 to 42% and 62 to 45% for stimuli one two three and four times the size of the CRF respectively.
The boundaries of the CRF were estimated with bar and grating stimuli whose characteristics were controlled interactively. For 38 of 61 neurons we confirmed these manual estimates by reverse correlation on responses evoked by a dynamic sequence of small white squares distributed in and around the CRF (square positions were chosen randomly for each frame). Reliable CRF estimates were obtained with 150 to 300 s of data (30 to 60 behavioral fixation trials). Generally there is excellent agreement between the CRF profile estimates obtained with the two methods. Our CRF estimates ranged from about 20 to 50 min of arc which is entirely consistent with the range of receptive field diameters obtained in awake behaving macaques by other researchers; for example see [
Animals viewed high-resolution natural images digitized on commercial photo-CDs (Corel Corp.) and shown at a resolution of 1280 × 1024 pixels. Images were shown for 10 s each. Neural responses and eye position were recorded continuously during this free viewing (8). Natural vision movies that simulated these specific free-viewing episodes were constructed by using the eye position records to determine the position of the recorded CRF during free viewing. In six cells the diameter of the reconstructed movies was four times the CRF and in 11 cells it was three times the CRF. These data have been combined in this report.
Each free-viewing episode produced a single-spike train evoked by a unique pattern of exploratory eye movements. In contrast natural vision movies were repeated many times. To obtain comparable sparseness estimates for these data we separately analyzed the spike train evoked by each repetition of the natural vision movie. The average of this distribution of sparseness values was then compared with the single sparseness value obtained from the free-viewing data. To ensure matched stimulus conditions we made all comparisons on a movie-by-movie basis. Note that sparseness values based on single-spike trains are biased upward because of the discrete nature of spike generation.
The random sinusoidal grating sequence was similar to that used by D. L. Ringach M. J. Hawken R. Shapley [ Nature 387 281 (1997)]. The orientation spatial frequency and phase of the grating were chosen randomly on each video frame (at 72 Hz). Gratings were shown at a Michelson contrast of 0.5. Before analysis stimuli were binned into 10° orientation steps and 6 to 12 spatial frequency steps. Responses were analyzed by parametric reverse correlation on orientation and spatial frequency averaging over phase. The mean responses across stimulus bins (at the peak response latency) were used to estimate the sparseness statistic.
Several theoretical studies of sparse population coding have reported the kurtosis of the distribution of responses observed across a set of linear filters with respect to a particular stimulus ensemble (2 20). This measure is not directly applicable to our data because the responses of area V1 neurons are asymmetric: cells typically exhibit low spontaneous rates and appropriate stimuli elevate these rates. To estimate kurtosis we converted each response distribution to a symmetric distribution by reflecting the data about the origin. The resulting symmetric distributions are unimodal with zero mean and decrease smoothly to zero. Our kurtosis statistic is well behaved and directly comparable to the results of theoretical studies.
Let P 1 and P 2 be the PSTH response vectors for a pair of neurons. Then cos(θ) = P 1 P 2 /∥ P 1 ∥ × ∥ P 2 ∥ where ∥ P n ∥ is the norm of the appropriate vector. This measure is sensitive to changes across the basis dimensions of the movie time stream and is insensitive to differences in absolute rate.
It is difficult to choose a scalar measure of response similarity appropriate for all situations; see [
]. To validate our results we performed two alternative versions of the population decorrelation analysis. For each neuron pair we also computed both the linear correlation coefficient and the neural discrimination index of Di Lorenzo. In both cases nCRF stimulation leads to significant decorrelation ( P ≤ 0.001). To ensure that the slightly different stimulus sizes do not influence our results we also performed all similarity analyses on a data set restricted to neuron pairs with identical CRF sizes (and thus identical stimulation). Under these conditions the decorrelating effect of the nCRF remains significant ( P ≤ 0.001).
The compound grating stimulus consisted of a CRF conditioning grating and a probe grating. We set the conditioning grating's orientation and spatial frequency to the neuron's preferred values [as determined by reverse correlation on responses to a dynamic grating sequence (19) presented in the CRF]. The phase of the conditioning grating varied randomly with each video frame. Both gratings were presented at a Michelson contrast of 0.5 and their edges were blended into one another and into the background. We performed reverse correlation on the position of the probe grating within the nCRF annulus (collapsing over all other parameters). To measure baseline responses we presented interleaved trials containing only the conditioning grating.
M. S. Lewicki and B. A. Olshausen in Advances in Neural Information Processing Systems 10 M. I. Jordan M. J. Kearns S. A. Solla Eds. (MIT Press Cambridge MA 1997) pp. 815–821.