# VLSI Implementation of Ternary Operand Binary Addition Using Parallel Prefix Adder for Area Efficiency 

PruthviRaj. Ravi, Dr. Rajasekhar. Karumuri<br>Department of ECE, University College of Engineering Kakinada (A), Kakinada, Andhra Pradesh, India.

Article Info<br>Volume 9, Issue 6<br>Page Number : 453-461

## Publication Issue

November-December-2022

## Article History

Accepted: 01 Dec 2022
Published : 12 Dec 2022


#### Abstract

Cryptographic applications and pseudo random generator perform modular arithmetic three operand is the basic fundamental unit used in all these applications. For performing three operand additions, CSA (carry save adder) is one of the most widely used adders[5]. But in CSA in the final stage, carry is propagated which impacts the delay. Prefix parallel adders are therefore employed to get around this. The parallel prefix adders use more space even though performance in terms of latency is improved. Parallel prefix adders can also be used to build three operand adders. A brand-new, high-speed, and hardware-efficient adder technique is used to boost performance in terms of latency and area. This adder approach uses four stages to achieve three operand addition. Since Han Carlson adder is used in third stage, the suggested adder is not area efficient. To overcome this, in this paper we are replacing the Han Carlson parallel prefix adder with sklansky adder.


Keywords: Carry Save Adder, Sklansky Adder, Han-Carlson Adder, Adder.

## I. INTRODUCTION

Three operand additions is the most commonly used in modular arithmetic which are widely used in cryptography applications. For maintaining optimum system performance along with retaining physical security, cryptography algorithms must be implemented on hardware [1]. In the cryptographic applications, modular arithmetic is used in which the three operand adder is the basic block [4]. In these applications which involve modular arithmetic, three operand addition is required it is basic fundamental operation in these application. Hence designing an efficient three operand adder is the need of the day.

With the rapid advancement in data communication internet services data privacy can become an important issue to be dealt with. To provide security, cryptographic applications are mostly used. Three operand adders is the basic fundamental unit used in this cryptography applications and modular arithmetic. As a result, the choice of essential building blocks affects how well modular arithmetic and cryptographic applications perform overall. The top module's performance varies depending on this foundational module.

When the addition is done between two operands or two input n -bit numbers, the adder is referred to as a
two operand adder. For implementing two operand operations, different types of adders named as ripple carry adders (RCA), parallel prefix adders (PPA), carry skip adders (CSKA), and so on are available[3]. Consider ripple carry adder. It is the most commonly used adder for performing two operand addition.it is simply designed by cascading the 1 bit full adder cells. The 4 bit RCA is depicted in the figure below.


Fig. 1 4-bit RCA
The Major drawback in this adder is its critical path delay. The second full adder has wait until the first full adder operation is performed. Similarly for performing n -bit addition it has to wait until ( $\mathrm{n}-1$ ) operation. Hence delay is more in case of ripple carry adder.

Consider carry skip adder. In this adder based on the skip logic, the carry will be either propagated or skipped. When carry is skipped the delay is effectively reduced [7]. But the carry is skipped only for one case. But in the remaining cases the carry is being propagated which significantly impacts the performance of the adder and only the application it is used. The block diagram of carry skip adder is shown below.

From the figure 2, it can be observed that carry Propagation in the above adder is skipped only if all the propagate signals that are individually generated from all the individual full adders is logic high and for the other cases the carry is propagated. The CSKA Performance is improved compared to RCA.


Fig. 2 Logic diagram of Carry Skip Adder (CSKA)

Consider parallel prefix adder. In parallel prefix adder, the entire operation is performed in three stages to produce the final sum output. They are:

1. Pre-processing.
2. Generation of carry.
3. Final processing.

Pre-processing: In this stage, from the inputs operands A and B , the propagate and generate signals are generated.

$$
\begin{aligned}
& P i=A i \oplus B i \ldots \ldots \text { (1) } \\
& G i=A i . B i \ldots \ldots \text { (2) }
\end{aligned}
$$

Generation of carry: Here in this case, the using propagate and generate signals, the carry is generated for each bit. The operation performed in this stage is parallel. The expressions for are shown below

$$
\begin{align*}
& P_{(i: k)}=P_{(i: j)} \cdot P_{(j-1: k)} \cdots \ldots \text { (3) } \\
& G_{(i: k)}=G_{(i: j)}+\left(G_{(j-1: k)} \cdot P_{(i: j)}\right) \tag{4}
\end{align*}
$$

This carry is generated by the use of various cell structures known as Gray cells, Black cells, and Buffer cells. Final carry is calculated by using these mentioned cells. Various parallel prefix architectures can be designed by varying this carry generation architecture.


Fig. 3 Black Cells, Gray Cells and Buffer cell used for carry generation stage
But the Speed in this adder is not as effectively improved.

Final processing stage: This stage generates the final sum based on the propagate and carry values produced in previous stages. Equation 5 depicts the Boolean expression.


Fig. 4 Overall architecture of parallel prefix adder.

Different applications like linear congruential generator, modified dual coupled linear congruential generator and coupled variable input linear congruential generator use three operand addition as the fundamental unit [8]. Among the mentioned LCG, MDCLCG is considered as the most secure. Its security enhances with the increasing bit width. However as the operand size increase, the delay and area also increases linearly. Since MDCLCG is more secure compared to other LCG, It is widely used. Since three operand is the basic fundamental unit, if the performance of the three operand adder is
improved in terms of power delay and area obviously the overall performance of MDCLCG[6].
Two parallel prefix adder with two input operands or one three-operand adder may be used to do threeoperand binary addition Operation. In various cryptographic and Pseudo Random Bit generator (PRBG) methods CSA is most commonly recommended for performing three operand addition.


Fig. 5 Block diagram of n-bit CSA

The modified dual CLCG (MDCLCG) and other cryptographic implementations on internet of things (IOT) -based hardware systems perform noticeably worse overall as a result of the delay in the Carry save adder (CSA) final stage [2].


Fig. 6 Carry propagation chain in CSA

In the carry save adder, all the intermediate carries are not propagated but the final stage carry is propagated. Hence in CSA final stage which performs carry propagation significantly impacts the performance.

The delay, which is CSA's primary flaw, is decreased but the area is increased by utilizing these parallel
prefix adders. For the three-operand binary addition, 2 parallel prefix adders with two operands may be used. Here is a parallel prefix adder module with a three operand adder.


Fig. 7 Block diagram of three operand adder using two parallel prefix adders

## II. RELATED WORKS

Previously, carry save adder (CSA) is the most commonly used multi operand adder (three operand). In this adder, the although the carry is not propagated in the intermediate stages but in the final stage carry is propagated. Due to the carry propagation in the final stage of CSA, the delay is more.

The multi (three in this case) operand adder is the fundamental component in the cryptography applications and pseudo random bit generation applications .since the delay of this three operand adder using CSA is high, it impacts the overall performance of the above mentioned applications. Hence it is not considered as the best choice. To overcome this delay drawback, parallel prefix adders is used.

The conventional parallel prefix adders have 3 stages. The computations for all steps are carried out concurrently in parallel in parallel prefix adders, which improves performance with respect of delay.

Two parallel prefix adders must be designed in order to construct a three operand adder.
The region is the main problem, despite the fact that latency has enhanced. As a result, this method uses a new parallel architecture consisting of 4 phases to create a three operand adder.
The third step, which consists of propagate and generation, uses a parallel prefix adder. In the propagate generation step of this study, the Han Carlson adder is used.
The Han Carlson adder block schematic is displayed below.


Fig: 816 bit Han Carlson adder

Although calculation of carry using this architecture, is done parallel, from the above figure, it can be observed that the number of black cell count is more. Hence the hardware efficiency of that adder is less. If the area of the basic adder is more, there will not be any space to add the extra logic or to incorporate more features in the chip. Hence designing an area efficient three operand adder is the need of the day.

## III. IMPLEMENTATION

A novel parallel prefix architecture employing four stages is proposed. They are:

1. Bitwise addition.
2. Base Logic.
3. Propagate and Generate logic (PG).
4. Final addition.

Previously CSA(carry save adder) is one of the most commonly used adder architecture for performing multi operand additions. But the problem of using CSA architecture in high performance applications. Because in the carry save adder architecture, the carry is being propagated in the final stage which significantly impacts the delay. To overcome this, we preferred using two parallel prefix adders to perform three operand addition. Although delay is improved here the consumed is large. To further improve the performance in terms of area and speed a four stage parallel prefix architecture is used. In this, in the propagate and generate stage Han Carlson adder is used. Although the performance is enhanced area is more. To overcome this, the Han Carlson in the propagate and generate stage is replaced with the sklansky adder in our proposed method. The logical expressions for those stages are shown below.
Stage-1: Bitwise Addition:

$$
\begin{aligned}
& s_{i}=a_{i} \oplus b_{i} \oplus c_{i} \\
& c y i=a_{i} \cdot b_{i}+b_{i} \cdot c_{i}+c_{i} \cdot a_{i}
\end{aligned}
$$

## Stage-2: Base Logic:

$$
\begin{gathered}
G_{i: i}=G_{i}=s_{i} \cdot c y_{i-1} \\
G_{0: 0}=G_{0}=s_{0} \cdot c i n \\
P_{i: i}=P_{i}=s_{i} \oplus c y_{i-1} \\
P_{0: 0}=P_{0}=s_{0} \oplus c i n
\end{gathered}
$$

Stage-3: PG (Generate and Propagate) Logic:

$$
\begin{gathered}
G_{i: j}=G_{i: k}+P_{i: K} \cdot G_{K-1: j}, \\
P_{i: j}=P_{i: k} \cdot P_{K-1: j}
\end{gathered}
$$

## Stage-4: Final addition:

$$
\begin{gathered}
S_{i}=\left(P_{i} \oplus G_{i-1: 0}\right), S_{0}=P_{0}, \\
\text { Cout }=G_{n: 0}
\end{gathered}
$$

Figure 9 depicts block diagram of the novel threeoperand binary adder. This novel adder architecture is shown in the figure below.


Fig.9(a) Block diagram of proposed three-operand adder

The bit wise addition uses input $a, b, c$ to generate the partial sum and partial carries using 2 xor gates and 3 and gates. Consider an example. Let us assume $a=1$, $b=1$ and $c=0$; since $s=a$ xor $b$ xor $c$, the partial sum and carry will be $\mathrm{s}=0$ and $\mathrm{cy}=1$;

The generated partial sum and carries are given as input to base logic which is the second stage in the proposed architecture. Now this base logic generates propagate and generate intermediate outputs from the partial sum and carry signals generated at the first stage. Considering the same example that is considered for bit wise addition, it is observed that $\mathrm{s}=0, \mathrm{cy}=1$ in the above example. So now they are considered as inputs in the Base Logic. The outputs are shown below

$$
\mathrm{P}=\mathrm{s} \text { xor } \mathrm{cy}
$$

$$
\mathrm{P}=0 \text { xor } 1=1
$$

$$
\mathrm{G}=\mathrm{s} \text { and } \mathrm{cy},
$$

$$
\mathrm{G}=0 \text { and } 1=0
$$



Bitwise Addition
Base Logic


Final addition logic

Fig.9(b)Gate level architectures for bitwise addition, Base Logic and Final addition.

Now in the third stage, using the propagate and generate signal produced at the output of second

The bit wise addition uses input $a, b, c$ to generate the partial sum and partial carries using 2 xor gates and 3 and gates. Consider an example. Let us assume $a=1$, $b=1$ and $c=0$; since $s=a$ xor $b$ xor $c$, the partial sum and carry will be $\mathrm{s}=0$ and $\mathrm{cy}=1$;

The generated partial sum and carries are given as input to base logic which is the second stage in the proposed architecture. Now this base logic generates propagate and generate intermediate outputs from the partial sum and carry signals generated at the first stage. Considering the same example that is considered for bit wise addition, it is observed that $s=0, c y=1$ in the above example. So now they are considered as inputs in the Base Logic. the outputs are shown below

$$
\begin{gathered}
P=s \text { xor } c y \\
P=0 \text { xor } 1=1 \\
G=s \text { and } c y \\
G=0 \text { and } 1=0
\end{gathered}
$$



Bitwise Addition Base Logic


Final addition logic
Fig.9(b)Gate level architectures for bitwise addition, Base Logic and Final addition.

Now in the third stage, using the propagate and generate signal produced at the output of second stage are given as input to the propagate and generate block. The propagate and generate block uses this signals and generate the final carries using this block. The propagate and generate block uses gray cells and black cells and buffers. Here in this stage, Han Carlson adder is previously used. Since it consumes more area, in our method, we are replacing this adder with the sklansky adder to improve the area efficiency.


Fig. 10 Proposed three-operand adder with Sklansky Adder

For any parallel prefix adder architecture, only gray cell, black cell and buffer cells will be present. For black cell, both propagate and generate need to be calculated. For gray cell, only generate is calculated. The buffer cell passes the same input to the output stage. Let us consider an example for performing operations of propagates and generate stage. The block diagram of 16 bit sklansky adder is shown below.


Fig. 1116 bit sklansky adder
From the figure3, it can be observed that the transistor count of the black cell is more compared to the gray cell. It can be clearly observed from figure 8 and 11 that the black cell count in Han Carlson adder when compared to the sklansky adder. Since black cell has larger gate count and Han Carlson adder has high number of black cells, the area occupied by the Han Carlson adder compared to sklansky adder. hence from this, it can be observed that Han Carlson adder is not as area efficient as sklansky adder.
Calculation procedure for black cell:
$P=P_{n-1}$ and $P_{n}$

Suppose that $P_{n-1}=1$, and $P_{n}=0$,

The output will be $\mathrm{P}=0$;

Now $G=\left(G_{n}\right.$ and $\left.P_{n-1}\right)$ or $\left(P_{n}\right)$

Suppose that $G_{n}=1, P_{n-1}=0, P_{n}=1$

Now these ( $\mathrm{P}, \mathrm{G}$ ) values that are obtained are passed as inputs to either buffer or gray cell according to the structure. Here $P_{n}$ and $G_{n}$ are the present values, $P_{n-1}$ and $\mathrm{G}_{\mathrm{n}-1}$ are the previous state values.
Calculation procedure for gray cell:

Now $G=\left(G_{n}\right.$ and $\left.P_{n-1}\right)$ or $\left(P_{n}\right)$

Suppose that $G_{n}=1, P_{n-1}=0, P_{n}=1$
The output will be $G=1$. Now this $G$ is passed as the output carry if there are no buffer cells or passed to through the buffer and is to calculate the carry bit. Buffer passes the same value that are given as input either from black cell or gray cell. In the proposed adder, Cin is considered for three-operand addition. To generate the final sum, the propagate signal from that block and carry signal from the previous block are being xored.
The final addition result is calculated using the expressions shown below.

$$
\begin{gathered}
S_{i}=\left(P_{i} \oplus G_{i-1: 0}\right) \\
S_{0}=P_{0} \\
\text { Cout }=G_{n: 0}
\end{gathered}
$$

Let us consider an example for producing the final sum. Let us consider the initial carry input $=0$, and the propagate value that is generated from the first input bit $a 0=1$, then the sum will be

$$
\begin{aligned}
& S=p \text { xor } \operatorname{cin} \\
& S=1 \text { xor } 0=1
\end{aligned}
$$

## IV. RESULTS AND DISCUSSIONS

The suggested three operand adder's block diagram, technology schematic, as well as simulation studies are shown following. Simulation results show increased performance in terms of area for the proposed three operand adder, which is based on Sklansky.

The output will be $G=1$


Fig 11 RTL of proposed three operand adder


Fig 12: Technology schematic of proposed three operand adder


Fig 13: Simulation output of proposed Adder Method.

|  | Area | Delay(ns) |
| :---: | :---: | :---: |
| Existing Han <br> Carlson adder | 368 | 7.5 |
| Proposed <br> sklansky adder | 213 | 6.8 |

Table 1: Comparison table for Existing and Proposed methods


Fig. 14 Comparison of area between existing and proposed Adder

## IV. CONCLUSION

In order to increase the area efficiency, a four step parallel prefix architecture in this study is devised, using Sklansky in the propagate and generate stages. Comparing the proposed adder to other parallel prefix three operand adders, such as the Han-Carlson Adder in the existing approach, the simulation results demonstrate the proposed adder's great hardware efficiency. The synthesis and simulation are verified by using Xilinx ISE tool.

## IV. REFERENCES

[1]. M. M. Islam, M. S. Hossain, M. K. Hasan, M. Shahjalal, and Y. M. Jang, FPGA implementation of high-speed area-efficient processor for elliptic curve point multiplication over prime field, IEEE Access, vol. 7, pp. 178811-178826, 2019.
[2]. Z. LIU, J. GROBSCHADL, Z. HU, K. JARVINEN, H. WANG, AND I. VERBAUWHEDE, ELLIPTIC CURVE CRYPTOGRAPHY WITH EFfiCIENTLY COMPUTABLE ENDOMORPHISMS AND ITS HARDWARE IMPLEMENTATIONS FOR THE INTERNET OF THINGS, IEEE TRANS. COMPUT., VOL. 66, NO. 5, PP. 773-785, MAY 2017.
[3]. P. RAMANATHAN AND P. T. VANATHI, "A NOVEL POWER DELAY OPTIMIZED 32-BIT PARALLEL PREFIX ADDER FOR HIGH SPEED COMPUTING", INTERNATIONAL JOURNAL OF RECENT TRENDS IN ENGINEERING, VOL. 2, NOVEMBER 2009.
[4]. B. PARHAMI, COMPUTER ARITHMETIC: ALGORITHMS AND HARDWARE DESIGN. NEW YORK, NY, USA: OXFORD UNIV. PRESS, 2000
[5]. Jun Chen and James E. Stine, "Enhancing parallel Prefix Structures with Carry Save Notion," Proceedings of the 51st IEEE International Midwest Symposium on Circuits and Systems. Knoxville, pp. 354-357, August 2008.
[6]. F. Frustaci, M. Lanuzza, P. Zicari, S. Perri and P. Corsonello, "Designing high speed adders in power constrained environments, " IEEE Transactions on Circuits and Systems, Vol.56, pp. 172-176, February 2009.
[7]. Padma Devi, Ashima Girdher, Balwinder Singh, "Improved Carry Select Adder with Reduced Area and Low Power Consumption ",International Journal of Computer Application, Vol. 3, No. 4, June 2010.
[8]. Singh, DilipKumar,"Design of Area and Power Efficient Modified Carry Select Adder", International Journal of Computer Applications, Vol. 33,No. 3, pp. 14-18, Nov2011.

## Cite this article as :

PruthviRaj. Ravi, Dr. Rajasekhar. Karumuri, "VLSI Implementation of Ternary Operand Binary Addition Using Parallel Prefix Adder for Area Efficiency", International Journal of Scientific Research in Science and Technology (IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 9 Issue 6, pp. 453-461, November-December 2022. Available at doi : https://doi.org/10.32628/IJSRST229660
Journal URL : https://ijsrst.com/IJSRST229660 |

