

# Haar Dwt of Delay Optimized High Performance Ladner Fischer Adder

Sandhya Aluru MV<sup>1</sup>, P. Murali Krishna S<sup>2</sup>

M. Tech Student<sup>1</sup>, Assistant Professor<sup>2</sup>

<sup>1&2</sup>Department of Embedded Systems & VLSI Design, Sri Krishnadevaraya University College of Engineering And Technology, Anantapur, India

### ABSTRACT

**Article Info** Volume 9, Issue 5 Page Number : 254-261

Publication Issue September-October-2022

### Article History

Accepted : 10 Sep 2022 Published : 26 Sep 2022 A parallel prefix-structured optimized Ladner Fischer adder was used to implement the Haar discrete wavelet transform. Since it is the most recent idea and a crucial option for balancing accuracy and parameter efficiency, we are thinking about the approximation topic in this instance. Prior to image processing and analysis, the image transformation is a crucial step. A lowcomplexity pre-processing filter appropriate for extremely energy-constrained image processing systems is the Haar discrete wavelet transform (HDWT). This paper provides an ideal HDWT hardware design based on the Ladner-Fisher algorithm for image processing at extremely high-performance efficiency. **Keywords -** Image/Video Processing, Optimized Controller, Optimized Haar Wavelet Transform, Optimized Kogge–Stone Adder/Subtractor

### I. INTRODUCTION

Digital Image Processing (DIP) is used in almost every fields known by today's modern human society such as medical, astronomy, entertainment and computer vision etc. Video Processing is the extension of the digital image processing where a sequence of still images are changing at very fast rate with proper sequences. This makes illusion to the viewer that the objects present in the frame are moving. In the case of the video, each still image is known as frame and the rate at which the frame changes are calculated in frames per second (fps) unit. As a result, image processing. For good quality image, the number of the pixels present in the corresponding image must be high and similarly for video both number of pixels in the frame and frame rate must be high.

Multimedia files are large and consume lots of hard disk space. The files size makes it time-consuming to move them from place to place over school networks or to distribute over the Internet. Compression shrinks files, making them smaller and more practical to store and share. Compression works by removing repetitious or redundant information, effectively summarizing the contents of a file in a way that preserves as much of the original meaning as possible. In order to reduce the volume of multimedia data over wireless channel compression techniques are widely used. Efficacy of a transformation scheme can be directly gauged by its ability to pack input data

**Copyright:** © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited



into as few coefficients as possible. This allows the quantize to discard coefficients with relatively small amplitudes without introducing visual distortion in the reconstructed image.



#### Fig 1. Image/Video Transmission System

Multimedia data processing, which encompasses almost every aspect of our daily life such as communication broad casting, data search, advertisement, video games, etc. has become an integral part of our life style. The most significant part of multimedia systems is application involving image or video, which require computationally intensive data processing. Moreover, as the use of mobile device increases exponentially, there is a growing demand for multimedia application to run on these portable devices. A typical image/video transmission system is shown in the Fig 1.

#### Wavelet Transform

A major disadvantage of the Fourier Transform is it captures global frequency information, meaning frequencies that persist over an entire signal. This kind of signal decomposition may not serve all applications well, for example Electrocardiography (ECG) where signals have short intervals of characteristic oscillation. An alternative approach is the Wavelet Transform, which decomposes a function into a set of wavelets.

### Wavelet

A Wavelet is a wave-like oscillation that is localized in time, an example is given below. Wavelets have two basic properties: scale and location. Scale (or dilation) defines how "stretched" or "squished" a wavelet is. This property is related to frequency as defined for waves. Location defines where the wavelet is positioned in time (or space).

### Haar wavelet transform

Image compression based on wavelet transform which is used to check the quality of the compressed image with respect to different thresholding techniques. The basic Haar wavelet transform is used to perform this compression and the entire algorithm is implemented using software-based simulation technique. The result shows that the soft thresholding technique is able to generate better quality image in terms of PSNR than hard thresholding technique. Hardware based discrete wavelet transform architecture. To reduce the memory requirements in the architecture, novel diagonal scan method is used to read the input image and also to design efficient filter banks, recursive pyramid hierarchical approach is considered.

Multilevel decomposition of image through discrete wavelet transform for image compression is an important parameter. The decomposition is performed through hardware architectures which is derived from fast Haar wavelet transform. This technique reduces the hardware utilizations required to decompose the input image. image fusion using wavelet decomposition method for satellite image. The fusion technique is implemented. This architecture is inefficient in terms of hardware utilizations due to the use of built-in blocks without proper optimizations.

A hardware-based architecture to implement watermarking technique in which the Haar wavelet is used as main component and to reduce the design complexity, modified lifting scheme is proposed. Denoising of ECG signal based on Haar wavelet transform and universal thresholding techniques. The entire architecture is designed through the built-in functions. Image compression based on Haar transform, DCT and Run Length Encoding techniques separately for JPEG image. The Haar wavelet transform showed good compression ratio in terms of image size and the PSNR than existing techniques.

The watermarking architecture was designed using Haar DWT and DCT algorithm where the conventional algorithms were modified to process videos frame by frame. a tunable VLSI architecture of DWT and to achieve area and memory efficient architecture, Distributed Arithmetic technique was used along with some degree of parallelism. efficient integer wavelet transform architecture which is used for QRS detection of ECG signal effectively where the wavelet transform is mainly used to de-noise the ECG signal. The haar wavelet, zero-crossing detector, threshold and decision blocks are used to implement the entire architecture.

# Working of Wavelet

The basic idea is to compute how much of a wavelet is in a signal for a particular scale and location. For those familiar with convolutions, that is exactly what this is. A signal is convolved with a set wavelet at a variety of scales.

In other words, we pick a wavelet of a particular scale (like the blue wavelet in the gif above). Then, we slide this wavelet across the entire signal i.e., vary its location, where at each time step we multiply the wavelet and signal. The product of this multiplication gives us a coefficient for that wavelet scale at that time step. We then increase the wavelet scale (e.g., the red and green wavelets) and repeat the process.

In this work, we present the first approximate HDWT VLSI hardware architectures which combine coefficient approximation and truncation. We investigate at design-time the approximate HDWT demonstrating the reduction in circuit area and power dissipation with a consequent trade-off in peak signalto-noise-ratio (SNR). Despite the lower PSNR, this proposal fulfills the ultimate quality performance at the application-level with a slight improvement in the accuracy while providing a reduction in energy required to process image signals. Our contributions presented in this paper are as follows:

- 1. An HDWT matrix approximation capable of fulfilling the service quality in the application and of producing a multiplier less hardware architecture.
- 2. A pruning in the approximate HDWT matrix, reducing the HDWT hardware architecture to just fewer parallel additions.
- 3. We demonstrate that our approximate HDWT proposal can sustain a higher level of truncation than the original HDWT (i.e., efficiently processing an input signal with a lower quantization level).
- 4. A discussion about the hardware performance of our approximate HDWT proposal and its benefits in circuit area and power dissipation are presented, ensuring the processing of the image signal with high quality.

# II. EXISTING METHOD

# Kogge–Stone adder

The Kogge–Stone Adder is the modified version of Carry Look Ahead Adder is shown in fig 2. The modification is done to reduce the delay problem in generating carry signal for large size adder architecture. This adder is able to produce the output faster than other existing adders with small area overheads. The operation of this adder is divided into three parts as Pre-processing, Carry Lookahead Network and Post-Processing respectively.

Pre-processing: In this stage, the propagate (p) and generate (g) signals are computed separately for each 'A' and 'B' signals respectively. The logical equations of this block can be written as

$$\mathrm{pi}=\mathrm{Ai}\oplus\mathrm{Bi}\ldots(1)$$

where i  $\leftarrow$  Length of the adder.

2. Carry Lookahead Network: The carry of the corresponding bits is computed separately in this stage which increases the maximum operating speed of this adder. This stage uses the propagate



(p) and generate (g) signals to determine the corresponding carry signal. The logical equation of this stage is given as

$$p i: j = pi:k+1&pk: j$$
 (3)  
 $gi: j = gi:k |(pi:k+1&gk: j) (4)$ 

where  $\{j, k\} \leftarrow$  Intermediate integer values used to mix signals.

**3.** Post-processing: The computation of the final sum of the corresponding bits are calculated in this stage. The logical equation for this stage is

$$Si = pi \bigoplus ci-1$$
 (5)

where  $ci-1 \leftarrow Generated carry from previous$ adder block.



Fig 2: Structure of kogge stone adder

For many cases of image analysis, it is necessary to convert the input image into frequency domain to overcome various issues that occurs in time domain analysis. Normally various types of Fourier Transforms such as DFT, FFT and STFT etc., are used where complex sinusoidal input data is considered. In most of the real time scenarios, the input data is infinite where the information is spread over the whole-time axis of the signal making it difficult to model through regular Fourier Transforms.

To overcome from this type of problems, windowing methods are used. The windowed version of Fourier Transforms is known as windowed Fourier Transforms which is given in Eq. (6) as

$$X(\tau,\omega) = \int_{-\infty}^{\infty} \omega(t-\tau) \cdot x(t) \cdot e^{-j\omega t} dt$$
(6)

where  $\omega(\cdot) \leftarrow$  Appropriate window Size. The X ( $\tau$ ,  $\omega$ ) is the Fourier Transforms of x(t) where the window  $\omega(\cdot)$  is shifted by an amount ' $\tau$ ' which is modulated version of the window and named as Short-Time Fourier Transform (STFT).

Due to the use of single window, the resolution of the analysis is always same for all locations in the timefrequency plane. By varying the window size, the resolution in both time and frequency domain can be changed which is achieved by wavelet transform.

In the case of wavelets, it is possible to design the wavelet function h(t) such that the set of translated and scaled versions of h(t) forms an Orthogonal basics function with the input signal.

$$h(t) = \begin{cases} 1, & 0 < t \le \frac{1}{2} \\ -1, & \frac{1}{2} \le t < 1 \\ 0, & \text{Otherwise} \end{cases}$$

Any Wavelet Transform uses two different filter banks namely high-pass and low-pass filter which generates low and high frequency coefficients present of the respective input image. For such partitioning, let us consider

$$\omega_4 = \begin{bmatrix} H \\ G \end{bmatrix}$$

where  $H \leftarrow$  Low-pass filter coefficient matrix for Haar Wavelet,  $G \leftarrow$  High-pass filter coefficient matrix for Haar Wavelet.

### **III. PROPOSED METHOD**

Wavelet transforms assume a scale on any real line, making it feasible for most practical and computationally expensive problems. The discrete WT function translates successive sums and multiplications. Fig.1d shows the top-level hardware block diagram of the Haar transform with four decomposition levels. It consists of M = 4 blocks called processing module (PM). This module determines the coefficients of the WT. Also, there is a control block responsible for synchronizing the operations between the PM. It ensures that the load of the registers in



block M = 4 occurs at the correct time. The other schemes in Fig.1 describe the architectural exploration of HDWT M = 4.



Fig 3. Block diagram of DWT

The architectures are composed of registers, adders, subtractors, multipliers, and shifts. Fig.3.1 depicts the hardware of the original HDWT (O-HDWT) without modifications to the matrix. In total, the original architecture has four subtractors, two multipliers, and six adders. A subtractor and a multiplier make up its critical path. One of the inputs of the multipliers is the H coefficient which is constant equal to  $\sqrt{12}$ . Therefore, to maximize the optimization of the HDWT baseline version, we implemented these multipliers employing efficient multiple constant multiplication (MCM). We generated the optimized MCM for the H coefficient using the Hcub algorithm automatically by the Spiral tool.

| $h_0$ | $h_1$ | 0     | 0     |     |
|-------|-------|-------|-------|-----|
| g0    | $g_1$ | 0     | 0     | ••• |
| 0     | 0     | $h_0$ | $h_1$ |     |
| 0     | 0     | g0    | $g_1$ |     |
| :     | 2     | :     | 2     |     |

Fig 4: Transform matrix of vector coefficients

#### **Optimized Haar Wavelet Transform**

The proposed hardware architecture of Optimized Haar Wavelet Transform is shown in Fig. 5 which consists of Pre-processing, Reset Controller, Data Format Conversion, Optimized Controller, Moving Window Architecture, Optimized Kogge-Stone Adder/Subtractor, Buffer, Shifter and D FF blocks respectively. First the input video is converted into a number of finite frames of standard size  $(256 \times 256)$ by the Pre-processing block The pixel values of those frames are then converted into corresponding userdefined format by the Data Format Conversion block to increase data accuracy which is then used to generate  $2 \times 2$  overlapped sub-matrix through Moving Window Architecture block. These sub-matrix pixel values are then processed by Optimized Kogge-Stone Adder/Subtractor blocks to generate all four subbands (i.e., LL, LH, HL and HH) respectively. Among these bands, HL, LH and HH Bands produce some negative coefficients which are removed by Buffer block. Now the intermediate signals are shifted using separate Shifter blocks to perform the corresponding division factors.



Fig. 5 Proposed architecture of optimized Haar wavelet transform

But in the case of non-separable Haar Wavelet Transform, it must be in non-overlapped format. As a result, Optimized Controller and D\_FF blocks are used in interdependent manner for discarding the intermediate values generated by these overlapped matrix pixels which are also used to implement Downsample by 2 of the intermediate values by the D\_FF (D-Flipflop) block. The extra output signal clk out and rst out are used for proper synchronization purpose. To generate nearly accurate result, enough bit sizes are considered at intermediate level with Q-notations. In future, high-resolution camera is interfaced with FPGA from which real-time high-speed video is captured and processed directly using this architecture with some modifications.

### **INVERSE DWT (FDWT)**

In the IDWT process, to get the reconstructed image, the wavelet details and averages can be used in the matrix multiply method and linear equations. For the matrix multiply method, the Scaling function coefficients are h0 = 1, h1 = 1 and Wavelet function coefficients are g0 = 1, g1 = -1.

The IDWT process can be performed using the following linear equations (7) and (8).

$$si = ai + di (7)$$
  
 $si+1 = ai - di(8)$ 

A single wavelet transform step using a matrix algorithm involves the multiplication of the signal

vector by a transform matrix, which is an N X N operation(where N is the data size for each transform step). In contrast, linear equations need only N operations. In practice matrices are not used to calculate the wavelet transform. The matrix form of the wavelet transform is both computationally inefficient and impractical in its memory consumption.

The first step of the forward transform (FDWT) for an eight element signal. Here signal is multiplied by the forward transform matrix with haar filter coefficients

| $\begin{bmatrix} s_0 \end{bmatrix}$ | 1 | 1 | 1  | 0 | 0 | 0 | 0 | 0 | 0 | 1 [ | a_ ] |   | $\begin{bmatrix} a_0 \end{bmatrix}$ |
|-------------------------------------|---|---|----|---|---|---|---|---|---|-----|------|---|-------------------------------------|
|                                     |   | 1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 6   | Co   |   | <i>a</i> <sub>1</sub>               |
| <i>s</i> <sub>2</sub>               |   | 0 | 0  | 1 | 1 | 0 | 0 | 0 | 0 |     | a    |   | $a_2$                               |
| <b>S</b> 3                          |   | 0 | 0  |   |   |   |   |   | 0 |     | c1   |   | $a_3$                               |
| <i>s</i> <sub>4</sub>               | Ξ | 0 | 0  | 0 | 0 | 1 | 1 | 0 | 0 | •   | a2   | ¢ | <i>c</i> <sub>0</sub>               |
| <b>S</b> 5                          |   |   |    |   |   |   |   |   | 0 |     | 2    |   | <i>c</i> <sub>1</sub>               |
| <i>s</i> <sub>6</sub>               |   |   | 0  | 0 | 0 |   |   |   | 1 |     | a3   |   | <i>c</i> <sub>2</sub>               |
| <b>s</b> 7                          |   | 0 | 0  | 0 | 0 |   |   |   |   |     | c3   |   | <i>c</i> <sub>3</sub>               |

Fig 6. Inverse transform matrix of vector coefficients



Fig 7: Steps involved in hardware implementation.

# Ladner Fischer adder

The Ladner-Fischer is the parallel prefix adder used to perform the addition operation. It is looking like tree structure to perform the arithmetic operation. Ladner-Fischer adder is used for high performance addition operation. The Ladner Fischer adder consists of black cells and gray cells. Each black cell consists of two AND gates and one OR gate. Multiplexer is combinational circuit which consists of multiple inputs and a single output. Each gray cell consists of only one AND gate.

The proposed Ladner-Fischer adder is flexible to speed up the binary addition and the structure looks like tree structure for the high performance of arithmetic operations. In ripple carry adders each bit wait for the last bit operation. In parallel prefix adders instead of waiting for the carry propagation of the first addition, the idea here is to overlap the carry propagation of the first addition with the computation in the second addition, and so forth, since repetitive additions will be performed by a multi-operand adder. Research on binary operation elements and motivation gives development of devices. Field programmable gate arrays [FPGA's] are most popular in recent years because they improve the speed of microprocessor-based applications like mobile DSP and telecommunication.

### IV. RESULTS AND DISCUSSION

The algorithm in MATLAB the matrix multiplication method has been used. We have tested the] as the image input file and also 8 randomly chosen image co-efficient for MATLAB simulation. After we have achieved satisfactory result in MATLAB we proceed to the next stage where we translate the code into VHDL. The development of algorithm in Verilog HDL is different in some aspects. The main difference is unlike MATLAB, Verilog HDL does not support many built-in functions such as convolution, max, mod, flip and many more.



So, while implementing the algorithm in Verilog HDL, linear equations of FDWT and IDWT is used



**RTL** Schematic

Simulation Results:



Evaluation of Area, Delay report:

|          | Area    | Delay   |
|----------|---------|---------|
| Existing | 476/217 | 2.590ns |
| Proposed | 476/213 | 2.556ns |

### V. CONCLUSION

Initially we analyze the DWT and its functionality using MODELSIM. The proposed algorithm is based on simplified linear operations such as shifter, adder and sub tractor to finely save external memory bandwidth and computational complexity. We also illustrated the performance of DWT algorithm in numerical simulations, and our model shows a significant performance improvement with speed, while the complexity is much lower compared to direct matrix multiplier-based approach.

#### VI. REFERENCES

- [1]. Aziz, Jiang, H., Han, J., Qiao, F., et al.: 'Approximate radix-8 booth multipliers for low-power and high-performance operation', Trans. Comput., 2016, 65, (8), pp. 2638–2644, doi: 10.1109/ TC.2015.2493547
- [2]. Xue, H., and Ren, S.: 'Low power-delay-product dynamic CMOS circuit design techniques', Electron. Lett., 2017, 53, (5), pp. 302–304, doi: 10.1049/el.2016.4173
- [3]. Chattopadhyay, T., and Gayen, D.: 'All-optical 2's complement number conversion scheme without binary addition', Optoelectronics, 2017, 11, (1), pp. 1–7, doi: 10.1049/iet-opt.2015.0087



- [4]. Qian, L., Wang, C., Liu, W., et al.: 'Design and evaluation of an approximate wallace-booth multiplier'. IEEE Int. Symp. Circuits and Systems (ISCAS), Montreal, QC, Canada, May 2016, pp. 1974–1977
- [5]. Chuang, P., Sachdev, M., and Gaudet, V.: 'A 167-ps 2.34-mW singlecycle 64-bit binary tree comparator with constant-delay logic in 65-nm CMOS', Trans. Circuits Syst., 2014, 61, (1), pp. 160–171, doi: 10.1109/TCSI.2013.2268591
- [6]. B. Bross, W.-J. Han, J.-R. Ohm, G. J. Sullivan, Y.-K. Wang, and T. Wiegand, High Efficiency Video Coding (HEVC) Text Specification Draft 10, document Rec. JCTVC-L1003, 2013.
- [7]. L.-M. Po and W.-C. Ma, "A novel four-step search algorithm for fast block motion estimation," IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp. 313–317, Jun. 1996.
- [8]. JVT of ISO/IEC MPEG, ITU-T VCEG, MVC Software Reference Manual-JMVC 8.2, document Rec. JVT-B118r2, May 2010.
- [9]. J.-C. Tuan, T.-S. Chang, and C.-W. Jen, "On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture," IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 61–72, Jan. 2002.
- [10]. Y. Lee, "A new frame-recompression algorithm and its hardware design for MPEG-2 video decoders," IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 6, pp. 529–534, Jun. 2003.

### Cite this article as :

Sandhya Aluru MV, P. Murali Krishna S, "Haar Dwt of Delay Optimized High Performance Ladner Fischer Adder", International Journal of Scientific Research in Science and Technology (IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 9 Issue 5, pp. 254-261, September-October 2022. Journal URL : https://ijsrst.com/IJSRST229548

