Initial Architecture

Given the initial assumptions proposed above, an initial multiplier scheme was envisioned, centered on the pipelined, signed, booth Radix-4 16 Bit Multiplier [Lem1]. This multiplier would calculate the 16 Bit partial products necessary for 3 2 & 64 Bit multiplication. Based on the clocking scheme in [Lem1], it was deemed necessary to require the 200 MHz clock cycles to be multiplied to a higher value for greater performance. The initial architecture deemed multiplication of the clock by a factor of six (6) to fit best with the pipeline speed of the multiplier. This multiplication would produce a 1.2 GHz clock to the "inter-pipelined" multiplier. Comparison of latency and throughput for the 16 Bit multipliers placed the optimum point to be four (4) multipliers with an extra three (3) cycles of latency to produce sixteen (16), 16 Bit products. Once a multiplier scheme was chose, then the additional portions of the design were determined.

In order to support the "squared" ratio multiplication, 32, 16 Bit registers had to be included to accept the necessary data for just 16 Bit multiplication. Following these registers, floating-point (FP) magnitude translation had to be performed for 32 & 64 Bit FP multiplication. This requires only masking the non-mantissa bits with the proper value, either 000000001 or 000000000001. These translated or non-translated 16 bit values are then sorted via the Sailboat Allocation Muxing scheme, so named because of the resemblance to a sailboat. After proper allocation, numbers are passed on to the multipliers and converted to unsigned numbers if necessary. At this stage, either the values are locked into the output registers, or continue on for 32 or 64 Bit multiplication

For 32 & 64 Bit multiplication, the correct partial products only had to be compressed via Wallace Trees and then added. Floating-Point multiplication would also require conversion (shifting, rounding, & exponent addition). These results would also be stored into the output registers. This was all combined into the final Tri-Multiplier design (Figure 3).


Figure 3