

EECS151/251A Spring 2019 Digital Design and Integrated Circuits

Instructors: John Wawrzynek and Arya Reais-Parsi

Lecture 21: Multiplier Circuits

#### <u>Warmup</u>

• Recall long multiplication of base-10 by hand:

12 x 56

• In base-2 (binary), we do the same thing:

011 x 101

#### **Multiplication**



 $a_1b_0+a_0b_1a_0b_0 \leftarrow Product$ 

Many different circuits exist for multiplication. Each one has a different balance between speed (performance) and amount of logic (cost).

### "Shift and Add" Multiplier



- Cost  $\alpha$  n, T = n clock cycles.
- What is the critical path for determining the min clock period?

- Sums each partial product, one at a time.
- In binary, each partial product is shifted versions of A or 0.

Control Algorithm:

- 1.  $P \leftarrow 0, A \leftarrow$  multiplicand,
  - B ← multiplier
- 2. If LSB of B==1 then add A to P else add 0
- 3. Shift [P][B] right 1
- 4. Repeat steps 2 and 3 n-1 more times.
- 5. [P][B] has product.

#### "Shift and Add" Multiplier

#### Signed Multiplication:

Remember for 2's complement numbers MSB has negative weight:

$$X = \sum_{i=0}^{N-2} x_i 2^i - x_{n-1} 2^{n-1}$$

ex:  $-6 = 11010_2 = 0.20 + 1.21 + 0.22 + 1.23 - 1.24$ 

• Therefore for multiplication:

a) subtract final partial product

- b) sign-extend partial products
- Modifications to shift & add circuit:
  - a) adder/subtractor
  - b) sign-extender on P shifter register

#### Convince yourself

• What's -3 x 5?

1101 x 0101

# Outline



#### Combinational multiplier

- Latency & Throughput
  - Wallace Tree
  - Pipelining to increase throughput

### Smaller multipliers

- Booth encoding
- Serial, bit-serial
- Two's complement multiplier



#### Unsigned Combinational Multiplier

#### Array Multiplier

Single cycle multiply: Generates all n partial products simultaneously.





#### **Carry-Save Addition**

- Speeding up multiplication is a matter of speeding up the summing of the partial products.
- "Carry-save" addition can help.
- Carry-save addition passes (saves) the carries to the output, rather than propagating them.

• Example: sum three numbers,  $3_{10} = 0011$ ,  $2_{10} = 0010$ ,  $3_{10} = 0011$ 

an help. 
$$3_{10} \ 0011$$
  
e output,  $+ 2_{10} \ 0010 \ 0100 = 4_{10}$   
them.  $\begin{cases} c \ 0001 \ 0100 = 4_{10} \\ s \ 0001 = 1_{10} \end{cases}$  carry-save add  
carry-save add  $\begin{cases} 3_{10} \ 0011 \ c \ 0010 = 2_{10} \\ s \ 0110 = 6_{10} \\ 1000 = 8_{10} \end{cases}$ 

- In general, *carry-save* addition takes in 3 numbers and produces 2.
  - Sometimes called a "3:2 compressor": 3 input signals into 2 in a potentially lossy operation
- Whereas, carry-propagate takes 2 and produces 1.

carry-propag

• With this technique, we can avoid carry propagation until final addition

Page 11

#### **Carry-save Circuits**



- When adding sets of numbers, carry-save can be used on all but the final sum.
- Standard adder (carry propagate) is used for final sum.
- Carry-save is fast (no carry propagation) and cheap (same cost as ripple adder)



#### Array Multiplier using Carry-save Addition



#### Array Multiplier Again



#### **Carry-save Addition**

CSA is associative and commutative. For example:

$$((X_0 + X_1) + X_2) + X_3) = ((X_0 + X_1) + (X_2 + X_3))$$



- A balanced tree can be used to reduce the logic delay.
- It doesn't matter where you add the carries and sums, as long as you eventually do add them.
- This structure is the basis of the *Wallace Tree Multiplier*.
- Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.
- Multiplier delay  $\alpha \log_{3/2} N + \log_2 N$

# Increasing Throughput: Pipelining



Throughput =  $1/4t_{PD,FA}$  instead of  $1/8t_{PD,FA}$  <sup>16</sup>



#### Smaller Combinational Multipliers

### **Booth Recoding: Higher-radix mult.**

Idea: If we could use, say, 2 bits of the multiplier in generating each partial product we would halve the number of columns and halve the latency of the multiplier!



18

### **Booth recoding**



A "1" in this bit means the previous stage needed to add 4\*A. Since this stage is shifted by 2 bits with respect to the previous stage, adding 4\*A in the previous stage is like adding A in this stage! 19





#### **Bit-serial Multiplier**

• Bit-serial multiplier (n<sup>2</sup> cycles, one bit of result per n cycles):



• Control Algorithm:

```
repeat n cycles { // outer (i) loop
repeat n cycles { // inner (j) loop
shiftA, selectSum, shiftHI
}
Note: The occurrence of a control
signal x means x=1. The absence
of x means x=0.
```



#### **Signed Multipliers**

## **Combinational Multiplier (signed!)**





#### 2's Complement Multiplication (Baugh-Wooley)

Step 1: two's complement operands so high order bit is  $-2^{N-1}$ . Must sign extend partial products and subtract the last one

|   |             |             |             |             | х3           | X2   | X1        | X0         |
|---|-------------|-------------|-------------|-------------|--------------|------|-----------|------------|
|   |             |             |             | ,           | * ¥3         | ¥2   | Y1        | Y0         |
|   |             |             |             | -           |              |      |           |            |
|   | <b>X3Y0</b> | <b>X3Y0</b> | <b>X3Y0</b> | <b>X3Y0</b> | <b>X3Y</b> 0 | X2Y0 | X1Y0      | X0Y0       |
| + | X3Y1        | X3Y1        | X3Y1        | X3Y1        | X2Y1         | X1Y1 | X0Y1      |            |
| + | X3Y2        | X3Y2        | X3Y2        | X2Y2        | X1Y2         | X0Y2 |           |            |
| - | X3Y3        | X3Y3        | X2Y3        | X1Y3        | X0Y3         |      |           |            |
|   |             |             |             |             |              |      |           |            |
|   | Z7          | Z6          | <b>z</b> 5  | Z4          | Z3           | Z2   | <b>Z1</b> | <b>Z</b> 0 |

Step 2: don't want all those extra additions, so add a carefully chosen constant, remembering to subtract it at the end. Convert subtraction into add of (complement + 1).

Step 3: add the ones to the partial products and propagate the carries. All the sign extension bits go away!

|   |      |             |             | x3Y0        | X2Y0 | X1Y0 | X0Y0 |
|---|------|-------------|-------------|-------------|------|------|------|
| + |      |             | <b>X3Y1</b> | X2Y1        | X1Y1 | X0Y1 |      |
| + |      | <b>X2Y2</b> | X1Y2        | X0Y2        |      |      |      |
| + | X3X3 | <u>x2x3</u> | x1Y3        | <u>x0y3</u> |      |      |      |
| + |      |             |             |             |      |      |      |
| + |      |             |             | 1           |      |      |      |
| - | 1    | 1           | 1           | 1           |      |      |      |

Step 4: finish computing the constants...

Result: multiplying 2's complement operands takes just about same amount of hardware as multiplying unsigned operands!

# 2's Complement Multiplication



#### **Example**

• What's -3 x -5?

1101 x 1011

### **Multiplication in Verilog**

You can use the "\*" operator to multiply two numbers:

```
wire [9:0] a,b;
wire [19:0] result = a*b; // unsigned multiplication!
```

If you want Verilog to treat your operands as signed two's complement numbers, add the keyword signed to your wire or reg declaration:

```
wire signed [9:0] a,b;
wire signed [19:0] result = a*b; // signed multiplication!
```

Remember: unlike addition and subtraction, you need different circuitry if your multiplication operands are signed vs. unsigned. Same is true of the >>> (arithmetic right shift) operator. To get signed operations all operands must be signed.

```
wire signed [9:0] a;
wire [9:0] b;
wire signed [19:0] result = a*$signed(b);
```

To make a signed constant: 10'sh37C