# <u>EECS150 - Digital Design</u> <u>Lecture 21 - Multipliers & Shifters</u>

## April 9, 2013 John Wawrzynek

Spring 2013

EECS150 - Lec21-mult-shift

Page 1

#### **Multiplication**

|                               |                                                                |          | a <sub>3</sub><br>b <sub>3</sub>                               | a <sub>2</sub><br>b <sub>2</sub> | a <sub>1</sub><br>b <sub>1</sub> | $a_0 \leftarrow Multiplicand$<br>$b_0 \leftarrow Multiplier$ |
|-------------------------------|----------------------------------------------------------------|----------|----------------------------------------------------------------|----------------------------------|----------------------------------|--------------------------------------------------------------|
| a <sub>3</sub> b <sub>3</sub> | a <sub>3</sub> b <sub>2</sub><br>a <sub>2</sub> b <sub>3</sub> | $a_2b_2$ | a <sub>2</sub> b <sub>1</sub><br>a <sub>1</sub> b <sub>2</sub> | a <sub>1</sub> b <sub>1</sub>    |                                  | a <sub>0</sub> b <sub>0</sub><br>Partial<br>products         |

 $a_1b_0+a_0b_1a_0b_0 \leftarrow Product$ 

Many different circuits exist for multiplication. Each one has a different balance between speed (performance) and amount of logic (cost).

. . .



### "Shift and Add" Multiplier

Signed Multiplication:

Remember for 2's complement numbers MSB has negative weight:

$$X = \sum_{i=0}^{N-2} x_i 2^i - x_{n-1} 2^{n-1}$$

ex:  $-6 = 11010_2 = 0.2^0 + 1.2^1 + 0.2^2 + 1.2^3 - 1.2^4$ = 0 + 2 + 0 + 8 - 16 = -6

- Therefore for multiplication:
  - a) subtract final partial product
  - b) sign-extend partial products
- Modifications to shift & add circuit:
  - a) adder/subtractor
  - b) sign-extender on P shifter register

Spring 2013

#### **Bit-serial Multiplier**

• Bit-serial multiplier (n<sup>2</sup> cycles, one bit of result per n cycles):



#### Array Multiplier

Single cycle multiply: Generates all n partial products simultaneously.



### **Carry-Save Addition**

- Speeding up multiplication is a ٠ matter of speeding up the summing of the partial products.
- Example: sum three numbers, ٠  $3_{10} = 0011, 2_{10} = 0010, 3_{10} = 0011$
- "Carry-save" addition can help.
- Carry-save addition passe (saves) the carries to the rather than propagating th

tion can help.  
a) 
$$3_{10}$$
 0011  
b) passes  
s to the output,  
gating them.  
carry-save add  
carry-propagate add  

$$\begin{cases}
3_{10} & 0010 \\
0100 &= 4_{10} \\
s & 0001 &= 1_{10}
\end{cases}$$
carry-save add  
 $3_{10} & 0011 \\
c & 0011 \\
c & 0010 &= 2_{10} \\
s & 0110 &= 6_{10} \\
1000 &= 8_{10}
\end{cases}$ 
carry-save add

- In general, carry-save addition takes in 3 numbers and produces 2. ٠
- Whereas, carry-propagate takes 2 and produces 1.

са

• With this technique, we can avoid carry propagation until final addition EECS150 - Lec21-mult-shift Spring 2013 Page 7





### **Carry-save Addition**

CSA is associative and communitive. For example:

$$(((X_0 + X_1) + X_2) + X_3) = ((X_0 + X_1) + (X_2 + X_3))$$



- A balanced tree can be used to reduce the logic delay.
- This structure is the basis of the Wallace Tree Multiplier.
- · Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.
  - Multiplier delay  $\alpha \log_{3/2} N + \log_2 N$

Spring 2013

#### **Constant Multiplication**

- Our discussion so far has assumed both the multiplicand (A) and the multiplier (B) can vary at runtime.
- What if one of the two is a constant?

Y = C \* X

• "Constant Coefficient" multiplication comes up often in signal processing and other hardware. Ex:

 $y_i = \alpha y_{i-1} + x_i$   $x_i \longrightarrow y_i$ 

where  $\,\alpha$  is an application dependent constant that is hard-wired into the circuit.

• How do we build and array style (combinational) multiplier that takes advantage of the constancy of one of the operands?

```
Spring 2013
```

EECS150 - Lec21-mult-shift

Page 11

### Multiplication by a Constant

- If the constant C in C\*X is a power of 2, then the multiplication is simply a shift of X.
- Ex: 4\*X



- What about division?
- What about multiplication by non- powers of 2?

#### Multiplication by a Constant

In general, a combination of fixed shifts and addition:
 Ex: 6\*X = 0110 \* X = (2<sup>2</sup> + 2<sup>1</sup>)\*X



Spring 2013

EECS150 - Lec21-mult-shift

Page 13

### Multiplication by a Constant

• Another example: C = 23<sub>10</sub> = 010111



- In general, the number of additions equals the number of 1's in the constant minus one.
- Using carry-save adders (for all but one of these) helps reduce the delay and cost, but the number of adders is still the number of 1's in C minus 2.
- Is there a way to further reduce the number of adders (and thus the cost and delay)?

Spring 2013

#### **Multiplication using Subtraction**

- Subtraction is ~ the same cost and delay as addition.
- Consider C\*X where C is the constant value 15<sub>10</sub> = 01111.
   C\*X requires 3 additions.
- We can "recode" 15

from 
$$01111 = (2^3 + 2^2 + 2^1 + 2^0)$$
  
to  $1000\overline{1} = (2^4 - 2^0)$ 

where 1 means negative weight.

• Therefore, 15\*X can be implemented with only one subtractor.



Spring 2013

EECS150 - Lec21-mult-shift

Page 15

## **Canonic Signed Digit Representation**

- CSD represents numbers using 1, 1, & 0 with the least possible number of non-zero digits.
  - Strings of 2 or more non-zero digits are replaced.
  - Leads to a unique representation.
- To form CSD representation might take 2 passes:
  - First pass: replace all occurrences of 2 or more 1's:

- Second pass: same as a above, plus replace  $0\overline{1}10$  by  $00\overline{1}0$
- Examples:

011101 = 290010111 = 230110110 = 54100T01 = 32 - 4 + 1001100T10T10T0010T00T = 32 - 8 - 1100T0T0 = 64 - 8 - 2

• Can we further simplify the multiplier circuits?

Spring 2013

### "Constant Coefficient Multiplication" (KCM)

Binary multiplier:  $Y = 231^*X = (2^7 + 2^6 + 2^5 + 2^2 + 2^1 + 2^0)^*X$ 



- CSD helps, but the multipliers are limited to shifts followed by adds.
  - CSD multiplier:  $Y = 231*X = (2^8 2^5 + 2^3 2^0)*X$



- How about shift/add/shift/add ...?
  - KCM multiplier: Y = 231\*X = 7\*33\*X = (2<sup>3</sup> 2<sup>0</sup>)\*(2<sup>5</sup> + 2<sup>0</sup>)\*X



- No simple algorithm exists to determine the optimal KCM representation.
- Most use exhaustive search method.
   Spring 2013 EECS150 Lec21-mult-shift

Page 17

## Fixed Shifters / Rotators

"fixed" shifters • ×6 ×5 ×4 x<sub>3</sub> x2 Хı x<sub>0</sub> "hardwire" the shift 1 ſ Logical Shift amount into the circuit. 0 0 У<sub>2</sub> У<sub>7</sub> У<sub>6</sub> У<sub>5</sub> У4 УЗ У<sub>1</sub> y<sub>0</sub> Ex: verilog: X >> 2 ×6 ×7 ×5 x<sub>4</sub> x<sub>2</sub> x<sub>1</sub> x<sub>0</sub> - (right shift X by 2 places) x<sub>3</sub> Rotate Fixed shift/rotator is • У<sub>7</sub> У<sub>б</sub> У<sub>5</sub> У4 УЗ У<sub>2</sub> Уı y<sub>0</sub> nothing but wires! ×2 ×1 ×<sub>0</sub>  $x_4$ ×6 ×5 x<sub>3</sub> X7 So what? 1 T Arithmetic Shift y<sub>6</sub> y<sub>5</sub> y<sub>4</sub> y<sub>3</sub> У7 <sup>y</sup>2 <sup>y</sup>1 Уn

## <u>Variable Shifters / Rotators</u>

- Example: X >> S, where S is unknown when we synthesize the circuit.
- Uses: shift instruction in processors (ARM includes a shift on every instruction), floating-point arithmetic, division/multiplication by powers of 2, etc.
- One way to build this is a simple shift-register:
  - a) Load word, b) shift enable for S cycles, c) read word.



- Worst case delay O(N), not good for processor design.
- Can we do it in O(logN) time and fit it in one cycle?

Spring 2013

EECS150 - Lec21-mult-shift

Page 19

## Log Shifter / Rotator

• Log(N) stages, each shifts (or not) by a power of 2 places,





## <u> "Improved" Shifter / Rotator</u>

• How about this approach? Could it lead to even less delay?



- What is the delay of these big muxes?
- Look a transistor-level implementation?

### **Barrel Shifter**



### **Connection Matrix**



Generally useful structure:

- N<sup>2</sup> control points.
- What other interesting functions can it do?

### <u>Cross-bar Switch</u>



Nlog(N) control signals.

- Supports all interesting permutations
- All one-to-one and one-to-many connections.
- Commonly used in communication hardware (switches, routers).

Page 25