## CS250 DISCUSSION #3 Christopher Yarp Spring 2016 Slides by Colin Schmidt with modifications by Christopher Yarp Std. Cell Slides adapted from Ben Keller #### COURSE LIBRARIES - Synopsys 32nm Educational Library - "saed" Synopsys Armenia Educational - Can use without an NDA (not a real process) - · Includes standard cells, SRAMs, memory compiler - Take a look: ~cs250/stdcells/synopsys-32nm/vendor # IMPORTANT COMPLIANCE INFORMATION - DO NOT post course materials online - Synopsys libraries are for educational use only - Synopsys tools are very, very proprietary - Even methodology scripts should stay in your private repo - The Makefiles and tcl scripts are considered proprietary ## SYNTHESIS REPORTS - Using design compiler to synthesize your RTL to gates has multiple reports output - You can find reports in build/vlsi/dc-syn/current-dc/reports - QOR (quality of results) - Timing - Area - Power - Clock-gating - Reference - Resources # QOR - General overview of the results - Timing summaries - Cell counts - Areas - Etc. | Timing Path Group | 'REGOUT' | | |------------------------------------------|----------|-------------| | | | | | Levels of Logic: | | 20.00 | | Critical Path Leng | | 0.98 | | Critical Path Slac | | 0.13 | | Critical Path Clk<br>Total Negative Sla | | 0.00 | | No. of Violating P | | 0.00 | | Worst Hold Violati | | 0.00 | | Total Hold Violati | | 0.00 | | No. of Hold Violat | | 0.00 | | | | | | Timing Path Group | 'clk' | | | Levels of Logic: | | 22.00 | | Critical Path Leng | | 1.06 | | Critical Path Slac | | 0.09 | | Critical Path Clk | | | | Total Negative Sla | | 0.00 | | No. of Violating P | | 0.00 | | Worst Hold Violati | | 0.00 | | Total Hold Violati | | 0.00 | | No. of Hold Violat | 10115: | 0.00 | | | | | | | | | | Cell Count | | | | Wiorarchical Call | Count | 36 | | Hierarchical Cell<br>Hierarchical Port | | 36<br>15692 | | Leaf Cell Count: | count. | 21063 | | Buf/Inv Cell Count | : | 5698 | | CT Buf/Inv Cell Co | | θ | | Combinational Cell | | 18023 | | Sequential Cell Co | unt: | 3040 | | Macro Count: | | Θ | | | | | | Area | | | | Area | | | | Combinational Area | : 45291. | 002973 | | Noncombinational A | | | | Buf/Inv Area: | 10445. | | | Net Area: | | 000000 | | Net XLength | | 111.72 | | Net YLength | : 302 | 961.09 | | Cell Area: | 65861. | 927185 | | Design Area: | 65861. | | | Net Length | | 072.81 | | | | | | Design Rules | | | | | | | | | | | | Total Number of Ne | | 22252 | | Nets With Violatio | ns: | 106 | | Nets With Violatio<br>Max Trans Violatio | ns: | 106<br>0 | | Nets With Violatio | ns: | 106 | #### TIMING - Shows multiple path groups - · we care about "clk" - Detailed analysis of timing - Top N critical paths - Broken down by delay - Goto for optimizing critical paths - If slack is negative, you did not meet timing - Unit is typically ns | Startpoint: ctrl/read_reg_3_<br>(rising edge-triggered flip-flo<br>Endpoint: ctrl/clk_gate_mem_s_reg/latch<br>(gating element for clock clk)<br>Path Group: clk<br>Path Type: max | op clocked | d by clk) | | | | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|-----------|-----------------------------------------|--------------------|----------------------| | Point | Fanout | Сар | Trans | Incr | Path | | Jack alk (stan adan) | • • • • • • • • • • • • • • • • • • • • | | • • • • • • • • • • • • • • • • • • • • | 0.0000 | 0.0000 | | clock clk (rise edge)<br>clock network delay (ideal) | | | | 0.0000<br>0.0000 | 0.0000<br>0.0000 | | trl/read_reg_3_/CLK (DFFX1_RVT) | | | 0.0000 | 0.0000 | 0.0000 r | | trl/read_reg_3_/Q (DFFX1_RVT) | | | 0.0399 | 0.1021 | 0.1021 f | | trl/T134[3] (net) | 7 | 5.1830 | 0.0000 | 0.0000 | 0.1021 f | | ctrl/add x 7 U126/A2 (NAND2X0 RVT) | | | 0.0399 | 0.0000 * | 0.1021 f | | trl/add_x_7_U126/Y (NAND2X0_RVT) | | | 0.0459 | 0.0466 | 0.1488 r | | trl/add_x_7_n101 (net) | 2 | 2.1347 | | 0.0000 | 0.1488 r | | ctrl/add_x_7_U116/A1 (NOR2X0_RVT) | | | 0.0459 | 0.0000 * | 0.1488 r | | ctrl/add_x_7_U116/Y (NOR2X0_RVT) | | 2 0724 | 0.0247 | 0.0590 | 0.2078 f | | ctrl/add_x_7_n93 (net) | 2 | 2.9734 | 0.0247 | 0.0000 | 0.2078 f | | trl/add_x_7_U95/A2 (NAND2X2_RVT)<br>trl/add_x_7_U95/Y (NAND2X2_RVT) | | | 0.0247<br>0.0184 | 0.0001 *<br>0.0533 | 0.2079 f<br>0.2611 r | | trl/add x 7 n76 (net) | 2 | 3.3858 | 0.0104 | 0.0000 | 0.2611 r | | trl/U778/A (INVX1_RVT) | - | 3.3030 | 0.0184 | 0.0001 * | 0.2612 r | | trl/U778/Y (INVX1_RVT) | | | 0.0268 | 0.0224 | 0.2836 f | | ctrl/add x 7 n75 (net) | 4 | 3.4106 | | 0.0000 | 0.2836 f | | ctrl/add_x_7_U73/A1 (NAND2X0_RVT) | | | 0.0268 | 0.0000 * | 0.2837 f | | ctrl/add_x_7_U73/Y (NAND2X0_RVT) | | | 0.0358 | 0.0351 | 0.3188 r | | trl/lt_x_13_n2 (net) | 1 | 1.2469 | | 0.0000 | 0.8804 r | | trl/lt_x_13_U1/A2 (NOR2X2_RVT) | • | 1.2403 | 0.0327 | 0.0000 * | 0.8804 r | | trl/lt_x_13_U1/Y (NOR2X2_RVT) | | | 0.0178 | 0.0509 | 0.9313 f | | trl/T192 (net) | 1 | 2.6165 | 0.0270 | 0.0000 | 0.9313 f | | trl/U1253/A1 (NAND2X0_RVT) | | | 0.0178 | 0.0001 * | 0.9313 f | | trl/U1253/Y (NAND2X0_RVT) | | | 0.0607 | 0.0433 | 0.9747 r | | trl/n599 (net) | 3 | 3.0593 | | 0.0000 | 0.9747 r | | ctrl/U1324/A1 (NAND2X0_RVT) | | | 0.0607 | 0.0000 * | 0.9747 r | | ctrl/U1324/Y (NAND2X0_RVT) | | | 0.0355 | 0.0199 | 0.9946 f | | trl/n591 (net) | 1 | 0.5392 | | 0.0000 | | | trl/U1325/A1 (AND2X1_RVT) | | | 0.0355 | | | | trl/U1325/Y (AND2X1_RVT) | 2 | 1 4006 | 0.0191 | | | | trl/N402 (net)<br>trl/U733/A2 (OR2X1_RVT) | 2 | 1.4986 | 0.0191 | 0.0000<br>0.0000 * | | | trl/U733/X2 (UR2X1_RVT) | | | 0.0191 | | 1.0597 f | | trl/N400 (net) | 1 | 0.4812 | 0.0104 | | 1.0597 f | | trl/clk gate mem s reg/EN (SNPS CLOCK GAT | _ | | | | 1.0597 f | | trl/clk_gate_mem_s_reg/EN (net) | | 0.4812 | | | 1.0597 f | | ctrl/clk gate mem_s_reg/latch/EN (CGLPPRX2 | RVT) | | 0.0164 | | | | data arrival time | | | | | 1.0597 | | clock clk (rise edge) | | | | 1.2500 | 1.2500 | | clock network delay (ideal) | | | | 0.0000 | | | clock uncertainty | | | | -0.0400 | | | ctrl/clk_gate_mem_s_reg/latch/CLK (CGLPPRX | 2_RVT) | | | 0.0000 | | | clock gating setup time | | | | -0.0583 | | | data required time | | | | | 1.1517 | | data required time | | | | | 1.1517 | | data arrival time | | | | | -1.0597 | | | • • • • • • • • • • • • • • • • • • • • | | | | | | slack (MET) | | | | | 0.0920 | #### AREA - Breakdown of area per nameable unit - Summary at top - Combinational vs Noncombinational - Units typically µm<sup>2</sup> Number of ports: 2117 Number of nets: 2012 Number of cells: 417 Number of combinational cells: 415 Number of sequential cells: 0 Number of macros: 0 Number of buf/inv: 219 Number of references: 14 Combinational area: 45291.002973 Buf/Inv area: 10445.572885 Noncombinational area: 20570.924212 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 65861.927185 Total area: undefined Hierarchical area distribution | | Global cell area | | Local cell area | | | | |-------------------------------|-------------------|------------------|-----------------|-----------------------|----------------|--------| | ierarchical cell | Absolute<br>Total | Percent<br>Total | | Noncombi-<br>national | Black<br>boxes | Design | | | | | | | | | | ha3Accel | 65861.9272 | 100.0 | | | | Sha3Ac | | trl | 19413.2980 | 29.5 | | | | CtrlMo | | trl/clk_gate_buffer_0_reg | | 0.0 | | 5.8453 | | SNPS_C | | | 5.8453 | | | 5.8453 | | SNPS_C | | trl/clk_gate_buffer_11_reg | 5.8453 | 0.0 | | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_buffer_12_reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | | | | | | | | | trl/clk_gate_buffer_8_reg | 5.8453 | Θ.Θ | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_buffer_9_reg | 5.8453 | Θ.Θ | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_buffer_count_reg | 5.8453 | Θ.Θ | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_hash_addr_reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_hashed_reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk gate mem s reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_msg_addr_reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_msg_len_reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | trl/clk_gate_read_reg | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | | | trl/clk_gate_rindex_reg | 5.8453 | 0.0 | | 5.8453 | | SNPS C | | trl/clk_gate_windex_reg | 5.8453 | Θ.Θ | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | | 5.8453 | 0.0 | 0.0000 | 5.8453 | 0.0000 | SNPS_C | | path | 45479.0698 | 69.1 | 8780.6754 | 10862.6231 | 0.0000 | DpathM | | path/ChiModule | 9767.0082 | 14.8 | 9767.0082 | 0.0000 | 0.0000 | ChiMod | | path/RhoPiModule | 1735.8036 | 2.6 | 1735.8036 | 0.0000 | 0.0000 | RhoPiM | | path/ThetaModule | 10984.3579 | 16.7 | | | | ThetaM | | path/clk_gate_state_24_reg | 5.8453 | 0.0 | 0.0000 | | | SNPS C | | path/clk_gate_state_3_reg | 5.8453 | | 0.0000 | | | SNPS C | | path/iota | 3336.9109 | | | 0.0000 | | IotaMo | | otal | | | 45291.0030 | 20570.9242 | 0.0000 | | | | | | | | | | ## POWER - Operating point listed at top - Power per module - Dynamic (Switching + Internal) - Static (Leak) - Careful of units (defined at top) ``` Library(s) Used: saed32rvt_tt1p05v25c (File: /home/ff/cs250/stdcells/synopsys-32nm/typical_rvt/db/cells.db) Operating Conditions: tt1p05v25c Library: saed32rvt_tt1p05v25c Wire Load Model Mode: Inactive. Global Operating Voltage = 1.05 Power-specific unit information : Voltage Units = 1V Capacitance Units = 1.000000ff Time Units = 1ns Dynamic Power Units = 1uW (derived from V,C,T units) Leakage Power Units = 1pW Total Hierarchy Power 2.61e+03 8.75e+03 9.35e+09 2.07e+04 100.0 Sha3Accel dpath (DpathModule) 1.18e+03 4.27e+03 6.80e+09 1.23e+04 iota (IotaModule) 3.989 6.275 8.25e+08 ChiModule (ChiModule) 5.527 RhoPiModule (RhoPiModule) 4.357 ThetaModule (ThetaModule) 25.012 1.58e+09 1.62e+03 ctrl (CtrlModule) 1.33e+03 4.44e+03 2.35e+09 8.12e+03 ``` ## CLOCK GATING - Simple results of tools attempt to clock gate everything - Does a very good job for our accelerator | Clock Gating Summary | | | | | | | |---------------------------------|---------------------|---|--|--|--|--| | Number of Clock gating elements | ļ 30 | ļ | | | | | | Number of Gated registers | <br> 2897 (96.25%) | ļ | | | | | | Number of Ungated registers | <br> 113 (3.75%) | | | | | | | <br> Total number of registers | <br> 3010 | | | | | | #### REFERENCE • Shows which standard cells were used in each module #### Attributes: - b black box (unknown) - bo allows boundary optimization - d dont\_touch - mo map\_only - h hierarchical - n noncombinational - r removable - s synthetic operator - u contains unmapped logic | Reference | Library | Unit | Area | Count | Total Are | a | Attri | butes | |--------------|---------------|--------|------|----------|-------------|---------|-------|-------| | AND2X1_RVT | saed32rvt_tt1 | p05v25 | c | 2.033152 | 1 | 2. | 03315 | 2 | | AND2X2_RVT | saed32rvt_tt1 | p05v25 | C | 2.287296 | 1 | 2. | 28729 | 96 | | A022X1_RVT | saed32rvt_tt1 | p05v25 | C | 2.541440 | 128 | 325. | 30432 | 21 | | CtrlModule | 19 | 413.29 | 8048 | 1 | 19413.29804 | 8 b | , h, | n | | DpathModule | 45 | 479.06 | 9757 | 1 | 45479.06975 | 7 b | , h, | n | | INVX1_RVT | saed32rvt_tt1 | p05v25 | C | 1.270720 | 31 | 39. | 39232 | 20 | | NBUFFX2_RVT | saed32rvt_tt1 | p05v25 | C | 2.033152 | 113 | 229. | 74618 | 38 | | NBUFFX4_RVT | saed32rvt_tt1 | p05v25 | C | 2.541440 | 54 | 137. | 23776 | 1 | | NBUFFX8_RVT | saed32rvt_tt1 | p05v25 | C | 3.812160 | 15 | 57. | 18246 | 00 | | NBUFFX16_RVT | saed32rvt_tt1 | p05v25 | C | 6.099456 | 6 | 36. | 59673 | 35 | | NOR2X0_RVT | saed32rvt_tt1 | p05v25 | C | 2.541440 | 1 | 2. | 54144 | 10 | | NOR2X2_RVT | saed32rvt_tt1 | p05v25 | C | 2.795584 | 1 | 2. | 79558 | 34 | | OR2X1_RVT | saed32rvt_tt1 | p05v25 | C | 2.033152 | 47 | 95. | 55814 | 19 | | OR2X2_RVT | saed32rvt_tt1 | p05v25 | C | 2.287296 | 17 | 38. | 88403 | 33 | | | | | | | | • • • • | | | Total 14 references 65861.927185 saed32rvt\_tt1p05v25c #### \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* #### Design: CtrlModule | *************************************** | | | | | | | | |-----------------------------------------|----------------|-----------|----------|---------|-----------------------------------------|--|--| | Reference | Library | Unit Area | Count | Total A | rea Attributes | | | | | | | | | • • • • • • • • • • • • • • • • • • • • | | | | AND2X1_RVT | saed32rvt_tt1 | p05v25c | 2.033152 | 520 | 1057.239094 | | | | AND2X2_RVT | saed32rvt_tt1 | p05v25c | 2.287296 | 26 | 59.469697 | | | | AND2X4_RVT | saed32rvt_tt1 | p05v25c | 2.795584 | 5 | 13.977920 | | | | AND3X1_RVT | saed32rvt_tt1 | p05v25c | 2.287296 | 22 | 50.320513 | | | | AND3X2_RVT | saed32rvt_tt1 | p05v25c | 2.541440 | 2 | 5.082880 | | | | AND4X1_RVT | saed32rvt_tt1 | p05v25c | 2.541440 | 13 | 33.038720 | | | | A021X1_RVT | saed32rvt_tt1 | p05v25c | 2.541440 | 47 | 119.447680 | | | | A021X2_RVT | saed32rvt_tt1 | p05v25c | 2.795584 | 9 | 25.160256 | | | | A022X1_RVT | saed32rvt_tt1 | p05v25c | 2.541440 | 1118 | 2841.329931 | | | | A022X2_RVT | saed32rvt_tt1 | p05v25c | 2.795584 | 9 | 25.160256 | | | | A0221X1_RVT | saed32rvt_tt1 | p05v25c | 3.049728 | 5 | 15.248640 | | | | A0222X1_RVT | saed32rvt_tt1 | p05v25c | 3.303872 | 43 | 142.066501 | | | | A0I21X1_RVT | saed32rvt_tt1 | p05v25c | 3.049728 | 36 | 109.790205 | | | | A0I22X1 RVT | saed32rvt_tt1 | p05v25c | 3.049728 | 91 | 277.525240 | | | | DFFX1_RVT | saed32rvt_tt1 | p05v25c | 6.607744 | 985 | 6508.628054 n | | | | DFFX2 RVT | saed32rvt tt1 | p05v25c | 7.116032 | 425 | 3024.313653 n | | | | HADDX1 RVT | saed32rvt_tt1 | p05v25c | 3.303872 | 26 | 85.900675 r | | | | INVX1 RVT | saed32rvt tt1 | p05v25c | 1.270720 | 399 | 507.017282 | | | | | _ | | | | | | | | SNPS CLOCK GATE HI | GH_CtrlModule_ | 24 5.84 | 5312 | 1 5 | .845312 b, h | | | | SNPS CLOCK GATE HI | GH CtrlModule | 25 5.84 | 5312 | 1 5 | .845312 b, h | | | | SNPS CLOCK GATE HI | | | 5312 | 1 5 | .845312 b, h | | | | SNPS CLOCK GATE HI | | | 5312 | | .845312 b, h | | | | XNORZX1 RVT | | | 4.320448 | | 311.072250 | | | Total 79 references 19413.298048 324.033594 4.320448 #### RESOURCES - Synopsys provides implementations of many basic blocks (DesignWare) - Shows which modules use these components, their parameters, and how the tools decided to optimize them (area, speed, etc) Resource Report for this hierarchy in file /scratch/cs250-fa14/lab-templates/lab2/build/vlsi/generated-src/Sha3Accel.v lte x 2 DW cmp width=64 lte 1170 ash 3 DW leftsh A width=4 sll 1194 SH width=2 width=2 sub\_x\_4 DW01 dec sub\_1197 eq\_x\_16 DW cmp eq\_1339 width=64 ash 17 DW leftsh A width=17 sll 1345 SH width=5 sub x 18 DW01 dec width=5 sub\_1347 ash 20 DW leftsh A width=17 sll 1353 SH width=5 lt\_x\_21 DW cmp width=5 lt\_1361 ash 22 DW leftsh A width=17 | sll 1366 SH width=5 lte\_1514 width=5 lte\_x\_29 DW cmp add x 8 | DW01 inc width=5 add 1263 add 1264 add 1355 DP OP 467 127 5122 Unsigned | 32 12 Unsigned | 8 I1 + I2 Cell Implementation Implementation lte x 2 DW cmp apparch (area) ash 3 DW leftsh astr (area) lt x 11 DW cmp pparch (area, speed) ash 20 DW leftsh astr (area) lt x 21 DW cmp apparch (area) DW leftsh astr (area) add x 50 DW01 inc apparch (area) add x 56 DW01 add pparch (area, speed) add x 7 DW01 add pparch (area, speed) DW01 inc apparch (area) DP OP 467 127 5122 | str (area, speed) # LAB 2 QUESTIONS?