Journal of VLSI signal processing systems for signal, image and video technology
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Sắp xếp:
A novel fault tolerance technique for recursive least squares minimization
Journal of VLSI signal processing systems for signal, image and video technology - Tập 1 - Trang 181-188 - 1989
Existing fault tolerance schemes have often been ignored by systolic array designers because they are too costly and unwieldy to implement. With this in mind, we have developed a new technique specially tailored for recursive least squares minimization that emphasizes simplicity. We propose a new decoding scheme that allows for error detection while wasting no precious processor cycles and preserving the basic structure of the systolic array. We will show that errors can be detected by examining a single scalar. The technique can be implemented with negligible algorithmic modification and little additional hardware. The simplicity of our method invites its use in future systolic arrays.
Blind Stochastic Feature Transformation for Channel Robust Speaker Verification
Journal of VLSI signal processing systems for signal, image and video technology - Tập 42 - Trang 117-126 - 2006
To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However, it is also important to ensure that the channel compensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresses this problem by a blind feature-based transformation approach in which the transformation parameters are determined online without any a priori knowledge of channel characteristics. Specifically, a composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics of enrollment speech. Based on the difference between the claimant's speech and the composite model, a stochastic matching type of approach is proposed to transform the claimant's speech to a region close to the enrollment speech. Therefore, the algorithm can estimate the transformation online without the necessity of detecting the handset types. Experimental results based on the 2001 NIST evaluation set show that the proposed transformation approach achieves significant improvement in both equal error rate and minimum detection cost as compared to cepstral mean subtraction and Znorm.
A Low Power Architecture for HASM Motion Tracking
Journal of VLSI signal processing systems for signal, image and video technology - Tập 37 - Trang 111-127 - 2004
This paper proposes low power VLSI architecture for motion tracking that can be used in online video applications such as in MPEG and VRML. The proposed architecture uses a hierarchical adaptive structured mesh (HASM) concept that generates a content-based video representation. The developed architecture shows the significant reducing of power consumption that is inherited in the HASM concept. The proposed architecture consists of two units: a motion estimation and motion compensation units. The motion estimation (ME) architecture generates a progressive mesh code that represents a mesh topology and its motion vectors. ME reduces the power consumption since it (1) implements a successive splitting strategy to generate the mesh topology. The successive split allows the pipelined implementation of the processing elements. (2) It approximates the mesh nodes motion vector by using the three step search algorithm. (3) and it uses parallel units that reduce the power consumption at a fixed throughput. The motion compensation (MC) architecture processes a reference frame, mesh nodes and motion vectors to predict a video frame using affine transformation to warp the texture with different mesh patches. The MC reduces the power consumption since it uses (1) a multiplication-free algorithm for affine transformation. (2) It uses parallel threads in which each thread implements a pipelined chain of scalable affine units to compute the affine transformation of each patch. The architecture has been prototyped using top-down low-power design methodology. The performance of the architecture has been analyzed in terms of video construction quality, power and delay.
Computing Moments by Prefix Sums
Journal of VLSI signal processing systems for signal, image and video technology - - 2005
Moments of images are widely used in pattern recognition, because in suitable form they can be made invariant to variations in translation, rotation and size. However the computation of discrete moments by their definition requires many multiplications which limits the speed of computation. In this paper we express the moments as a linear combination of higher order prefix sums, obtained by iterating the prefix sum computation on previous prefix sums, starting with the original function values. Thus the p′th moment
$$m_p = \sum\nolimits_{x = 1}^N {} x^p f(x)$$
can be computed by O (N · p) additions followed by p multiply-adds. The prefix summations can be realized in time O(N) using p + 1 simple adders, and in time O(p log N) using parallel prefix computation and O(N) adders. The prefix sums can also be used in the computation of two-dimensional moments for any intensity function f(x,y). Using a simple bit-serial addition architecture, it is sufficient with 13 full adders and some shift registers to realize the 10 order 3 image moment computations
$$(m_{00} ,m_{01} ,m_{10} ,m_{02} ,m_{20} ,m_{12} ,m_{21} ,m_{03} ,m_{30} )$$
for a 512 × 512 size image at the TV rate. In 1986 Hatamian published a computationally equivalent algorithm, based on a cascade of filters performing the summations. Our recursive derivation allows for explicit expressions and recursive equations for the coefficients used in the final moment calculation. Thus a number of alternative forms for the moment computation can be derived, based on different sets of prefix sums. It is also shown that similar expressions can be obtained for the moments introduced by Liao and Pawlak in 1996, forming better approximations to the exact geometric moments, at no extra computational cost.
FPGA Implementation of a Pipelined On-Line Backpropagation
Journal of VLSI signal processing systems for signal, image and video technology - Tập 40 - Trang 189-213 - 2005
The paper describes the implementation of a systolic array for a multilayer perceptron with a hardware-friendly learning algorithm. A pipelined modification of the on-line backpropagation algorithm is shown and explained. It better exploits the parallelism because both the forward and backward phases can be performed simultaneously. The neural network performance for the proposed modification is discussed and compared with the standard so-called on-line backpropagation algorithm in typical databases and with the various precisions required. Although the preliminary results are positive, subsequent theoretical analysis and further experiments with different training sets will be necessary. For this reason our VLSI systolic architecture—together with the combination of FPGA reconfiguration properties and a design flow based on generic VHDL—can create a reusable, flexible, and fast method of designing a complete ANN on a single FPGA and can permit very fast hardware verifications for our trials of the Pipeline On-line Backpropagation algorithm and the standard algorithms.
Architecture of an Image Rendering Co-Processor for MPEG-4 Visual Compositing
Journal of VLSI signal processing systems for signal, image and video technology - Tập 31 - Trang 157-171 - 2002
The TANGRAM VLSI co-processor is intended as a building block for use in system-on-chip (SOC) designs for the versatile MPEG-4 multimedia standard. It is designed to perform the computation intensive final step of MPEG-4 video decoding: compositing of scenes at the display. This includes warping and alpha blending of multiple full-screen video textures in real-time. TANGRAM consists of a RISC control processor and multiple powerful arithmetic units that perform rendering calculations directly in hardware. This hybrid architecture enables adaptation to changes in algorithms or support for different video-formats in software. Communication to a host CPU and video decoding hardware is done via the very common PI-bus on-chip interface. TANGRAM directly interfaces with the ITU-R601/656 digital video output. VHDL implementation and synthesis for a 0.35 μ standard-cell library provide an estimate of 100 MHz achievable clock frequency (worst-case), 52 mm2 overall area and 1 Watt power dissipation. TANGRAM has sufficient performance for rendering of MPEG-4 Main Profile@Layer3 scenes (ITU-R 601).
On partitioning and fault tolerance issues for neural array processors
Journal of VLSI signal processing systems for signal, image and video technology - Tập 6 - Trang 85-94 - 1993
In this article, we have studied time-efficient schedule and fault-tolerant design of partitioned array processors for neural networks. First, we have applied the locally-sequential-globally-parallel (LSGP) partitioning scheme to decompose large-size neural network algorithms so that they can be mapped into array processors of smaller size. Then we have derived an optimal latency schedule, i.e., for the same decomposition the schedule outperforms any other schedule, in terms of overall execution time. We have further proposed an algorithm-based fault tolerance (ABFT) method to guarantee higher reliability for the array processor implementation.
Configurable array logic circuits for computing network error detection codes
Journal of VLSI signal processing systems for signal, image and video technology - - 1993
Configurable Array Logic (CAL) has a basic architecture which is a cellular array with nearest neighbor connections. The cells in the array are dynamically programmable using transistor switches controlled by static RAM cells. Each cell can realize any two-input Boolean operation or act as a simple latch, as well as providing routing for pass-through connections to allow non-neighbor inter-cell connections. In this article, we demonstrate the versatility of the CAL technology by presenting efficient CAL circuits for computing all of the major error detection codes now in use for worldwide computer networking; these include CCITT, IEEE, Internet and ISO standard codes. The circuits, each having a version which comfortably fits on to a single 32×32 cell CAL chip, are appropriate for use as hardware accelerators to help computers deal with the ever increasing rates of data transmission over networks. The first class of error detection codes described are thecyclic redundancy codes (CRCs), which are in virtually universal use for bit serial transmission over physical links. The other class of error detection codes described are themodulo 2
n
— 1checksums, which are in common use for byte transmission over networks and inter-networks.
Rapid Prototyping of Application-Specific Signal Processors (RASSP) In-Progress Report
Journal of VLSI signal processing systems for signal, image and video technology - Tập 15 - Trang 29-47 - 1997
The goal of the DARPA/Tri-Service-sponsored Rapid Prototyping of Application-Specific Signal Processors (RASSP) program is to reduce the cost and time to develop and manufacture signal processors by at least a factor of four. Lockheed Martin Advanced Technology Laboratories' (ATL) approach to reaching this goal is based on three thrusts: methodology, model-year architecture, and infrastructure (enterprise). The Advanced Technology Laboratories' RASSP team—composed of an alliance of companies—implemented the first baseline RASSP system, which advances today's state-of-the-art by a factor of >2X. The Advanced Technology Laboratories' RASSP team used the methodology and tools to demonstrate cost and design-cycle improvements on the benchmark virtual proto-type, and developed a hardware/software system that demonstrated first-pass success. Additional developments underway will provide further benefits and will demonstrate 4X improvements in cost and time to market. This paper updates the team's progress halfway through the program, and highlights the impact of using the RASSP concepts on the design of a SAR processor, a Navy standard processor upgrade, and a CNI application.
Static scheduling for synthesis of DSP algorithms on various models
Journal of VLSI signal processing systems for signal, image and video technology - Tập 10 Số 3 - Trang 207-223 - 1995
Tổng số: 707
- 1
- 2
- 3
- 4
- 5
- 6
- 10