# ECE 6745 Complex Digital ASIC Design Topic 9: CMOS Combinational Logic

## School of Electrical and Computer Engineering Cornell University

revision: 2025-02-13-11-19

| 1 | RC Modeling                                  | 3  |
|---|----------------------------------------------|----|
| 2 | Delay                                        | 5  |
|   | 2.1. RC Delay of Inverter                    | 5  |
|   | 2.2. RC Delay of 2-Input NAND Gate           | 6  |
|   | 2.3. Equal Rise/Fall Times                   | 9  |
|   | 2.4. Equal Drive Strength                    | 10 |
|   | 2.5. Larger Gates                            | 11 |
|   | 2.6. Larger Loads                            | 12 |
|   | 2.7. Comparison of Inverter, NAND, NOR Gates | 13 |
|   | 2.8. Logical Effort: Single Stage            | 16 |
|   | 2.9. Logical Effort: Multiple Stages         | 22 |
| 3 | Energy                                       | 37 |
| 4 | Area                                         | 44 |

Copyright © 2025 Christopher Batten. All rights reserved. This handout was prepared by Prof. Christopher Batten at Cornell University for ECE 6745 Complex Digital ASIC Design. Download and use of this handout is permitted for individual educational non-commercial purposes only. Redistribution either in part or in whole via both commercial or non-commercial means requires written permission.

## 1. RC Modeling



- *C*<sub>sh</sub> capacitors do not actually switch, so ignore
- Lump  $C_{dbp} + C_{dbn}$  since both tied to constant nodes
- Lump  $C_{gsp} + C_{gsn}$  since both tied to constant nodes
- Assume PMOS mobility is 2× worse than NMOS mobility

- Let *C* be the gate capacitance of minimum sized NMOS
- Let *R* be the effective resistance of a minimum sized NMOS
- Let *k* be width of a transistor relative to minimum sized NMOS



#### 2-Input NAND Gate



Draw and label the parasitic capacitances.

## 2. Delay

- We will initially use RC modeling to estimate delay
- We will then use RC modeling to derive logical effort (LE)
- LE is a fast way to estimate delay for simple static CMOS circuits
- Often need to use a mix of RC modeling and LE

## 2.1. RC Delay of Inverter

$$V_{dd}$$

$$V$$

• Let  $t_{pd}$  be the propagation delay, time until  $V_{out} = V_{dd}/2$ 

$$V_{out} = V_{dd} e^{-t/\tau}$$

$$\frac{V_{dd}}{2} = V_{dd} e^{-t/\tau}$$

$$\frac{1}{V_{dd}} \frac{V_{dd}}{2} = e^{-t/\tau}$$

$$\ln\left(\frac{1}{2}\right) = \frac{-t}{\tau}$$

$$-\tau \ln\left(\frac{1}{2}\right) = t$$

$$t = \tau \ln(2)$$

- So  $t_{pd} = \ln(2) \cdot RC_1$
- Let  $R' = \ln(2) \cdot R$ , so  $t_{pd} = R'C_1$
- For inverter on previous page,  $t_{vd} = 2R'C$
- We usually just assume effective resistance is scaled by ln(2)
- So propagation delay of inverter on previous page:

$$t_{pd} = 2RC$$

## 2.2. RC Delay of 2-Input NAND Gate



- Requires complicated 2nd order model
- We can use a simple approximation

$$\tau = \tau_1 + \tau_2 = RC_1 + (R+R)C_2$$

$$= RC + (2R)(3C)$$

$$= RC + 6RC = 7RC \quad (3.5 \times \text{slower than inverter})$$

- Best when one  $\tau$  much larger than the other  $\tau$
- Even if  $\tau_1 = \tau_2$ , error is < 15%

#### Generalized Elmore Delay

$$t_{pd} = \sum_{i}^{\text{all nodes}} R_{ij} C_i$$



Assume all resistances are R and all capacitances are C

- Delay of path from x to y is impacted by branch to z
- Delay of path from x to z is impacted by branch to y
- For path x to y, lump  $C_2 + C_3$  and use shared resistance  $R_0 + R_1$
- For path x to z, lump  $C_1$  and use shared resistance  $R_0 + R_1$
- This extra term estimates impact of delay due to "branch"

$$T_{pd,xy} = R_0C_0 + (R_0 + R_1 + R_2)C_1 + (R_0 + R_1)(C_2 + C_3)$$
  
=  $RC + 3RC + 4RC = 8RC$ 

$$T_{pd,xz} = R_0C_0 + (R_0 + R_1 + R_3)C_2 + (R_0 + R_1 + R_3 + R_4)C_3 + (R_0 + R_1)C_1$$
  
=  $RC + 3RC + 4RC + 2RC = 10RC$ 

## Use Elmore Delay to Estimate Rise/Fall Times for 2-Input NAND Gate



$$A = 1$$

$$B = 0 \rightarrow 1$$

$$t_{pd,1\to0}$$

$$A = 0 \rightarrow 1$$

$$B = 1$$

$$t_{pd,0\rightarrow 1}$$

$$A = 1 \rightarrow 0$$

$$B = 1 \rightarrow 0$$

$$t_{pd,0\rightarrow 1}$$

$$A = 1$$

$$B = 1 \rightarrow 0$$

$$t_{pd,0\rightarrow 1}$$

$$A = 1 \rightarrow 0$$

$$B = 1$$

 $in = 1 \rightarrow 0$ 

## 2.3. Equal Rise/Fall Times



- For equal rise/fall times, the effective resistance of pullup must equal effective resistance of pulldown
- If we assume PMOS mobility  $2\times$  worse than NMOS, then PMOS must be  $2\times$  size of NMOS in an inverter for equal rise/fall times

## 2.4. Equal Drive Strength

• Size transistors so worst case effective resistance is equal in both the pullup and pulldown networks.



 $t_{pd,1 o 0} \hspace{1cm} t_{pd,0 o 1}$  worst best worst best

inverter

2-input NAND w/o internal cap

2-input NAND w/internal cap

- Is this a fair comparison? No, we are not normalizing anything across these gates. We need to either normalize:
  - Input gate gap (i.e., load on previous gate)
  - Drive strength (i.e., effective resistance)

• All three gates with equal rise/fall times and equal drive strengths



## 2.5. Larger Gates



• This is the parasitic delay, independent of size (*k*)

## 2.6. Larger Loads



$$t_{pd,1 \rightarrow 0} =$$

## 2.7. Comparison of Inverter, NAND, NOR Gates

• Complete a fair comparison assuming equal rise/fall times, equal drive strength, only parasitic delay

|              |                  | $t_{pd,1 \to 0}$ |      | $t_{pd,0}$ | ) → 1 |
|--------------|------------------|------------------|------|------------|-------|
|              |                  | worst            | best | worst      | best  |
| inverter     |                  |                  |      |            |       |
| 2-input NAND | w/o internal cap |                  |      |            |       |
| 2-input NAND | w/ internal cap  |                  |      |            |       |
| 2-input NOR  | w/o internal cap |                  |      |            |       |
| 2-input NOR  | w/ internal cap  |                  |      |            |       |





### Use Elmore Delay to Estimate Rise/Fall Times for 2-Input NOR Gate



$$t_{pd,1 \to 0}$$
 $A = 0$ 

$$t_{pd,1\to 0}$$

$$A = 0 \to 1$$

$$B = 0$$

 $B = 0 \rightarrow 1$ 

$$t_{pd,0\rightarrow 1}$$
 
$$A = 0$$
 
$$B = 1 \rightarrow 0$$

$$t_{pd,0\to 1}$$

$$A = 1 \to 0$$

$$B = 0$$

### Use RC Modeling to Estimate Delay of 2-Input NAND and NOR Gates

- Ignore internal capacitance
- Assume worst case delay
- Assume an output load of 15C

## 2.8. Logical Effort: Single Stage

- Logic effort (LE) is just an abstraction over RC modeling
- Logic effort (LE) is a linear delay model
- Useful for building intuition for static CMOS modeling
- Keep in mind often need to use a mix of RC modeling and LE



$$C_{in} = \alpha C_t$$
  
 $R_i = R_{ui} = R_{di} = R_t / \alpha$   
 $C_{vi} = \alpha C_{vt}$ 

• We know the propagation delay of the gate instance is:

$$t_{pd} = R_i(C_{out} + C_{pi})$$

• Let's rewrite this in terms of the template

$$C_{in} = \alpha C_t$$
  
 $R_i = R_{ui} = R_{di} = R_t/\alpha$   
 $C_{pi} = \alpha C_{pt}$ 

$$t_{pd} = R_i(C_{out} + C_{pi})$$

$$= R_iC_{out} + R_iC_{pi}$$

$$= \frac{R_t}{\alpha}C_{out} + \frac{R_t}{\alpha}C_{pi}$$

$$= \frac{R_t}{\alpha}C_{out} + \frac{R_t}{\alpha}(\alpha C_{pt})$$

$$= \frac{R_t}{\alpha}\left(\frac{C_{in}}{C_{in}}\right)C_{out} + \frac{R_t}{\alpha}(\alpha C_{pt})$$

$$= \frac{R_t}{\alpha}\alpha C_t\left(\frac{C_{out}}{C_{in}}\right) + \frac{R_t}{\alpha}(\alpha C_{pt})$$

$$= R_tC_t\left(\frac{C_{out}}{C_{in}}\right) + R_tC_{pt}$$

- We don't want to deal with absolute delay
- Let's rewrite our propagation delay equation to be a "relative" delay
- Relative to the delay of a single unloaded minimal inverter
- Let's start by defining some new parameters

$$au = R_{inv}C_{inv}$$
 "relative delay units"  $g = R_tC_t/R_{inv}C_{inv}$  logical effort  $h = C_{out}/C_{in}$  electrical effort  $p = R_tC_{pt}/R_{inv}C_{inv}$  parasitic delay

• Let's rewrite our propagation delay equation in terms of  $\tau$ 

$$au = R_{inv}C_{inv}$$
 "relative delay units"  $g = R_tC_t/R_{inv}C_{inv}$  logical effort  $h = C_{out}/C_{in}$  electrical effort  $p = R_tC_{pt}/R_{inv}C_{inv}$  parasitic delay

$$t_{pd} = d_{abs} = R_t C_t \left(\frac{C_{out}}{C_{in}}\right) + R_t C_{pt}$$

$$= \left(\frac{R_{inv} C_{inv}}{R_{inv} C_{inv}}\right) R_t C_t \left(\frac{C_{out}}{C_{in}}\right) + \left(\frac{R_{inv} C_{inv}}{R_{inv} C_{inv}}\right) R_t C_{pt}$$

$$= R_{inv} C_{inv} \left(\frac{R_t C_t}{R_{inv} C_{inv}}\right) \left(\frac{C_{out}}{C_{in}}\right) + R_{inv} C_{inv} \left(\frac{R_t C_{pt}}{R_{inv} C_{inv}}\right)$$

$$= \tau g h + \tau p$$

$$= \tau (g h + p)$$

$$d_{abs} = \tau(gh + p)$$

• Let *d* be the delay in units of  $\tau$  (i.e., d = gh + p)

#### Templates for Inverter, NAND, NOR Gates



## Use LE to Estimate Delay of 2-Input NAND and NOR Gates

• Assume an output load of 15C

Let's list the many approximations we have made

## 2.9. Logical Effort: Multiple Stages

• Path delay is the sum of the delay of each stage

$$D = \sum d_i = \sum (g_i h_i + p_i)$$



• Calculate path delay assuming canonical sized gates

| $g_i$           | 1    | 5/3  | 4/3 | 1     |
|-----------------|------|------|-----|-------|
| $h_i$           | 5/3  | 4/5  | 3/4 | 40/3  |
| $g_i \cdot h_i$ | 5/3  | 4/3  | 1   | 40/3  |
| $p_i$           | 1    | 2    | 2   | 1     |
| $d_i$           | 2.67 | 3.33 | 3   | 14.33 |
| D               |      |      |     | 23.33 |

• Calculate path delay assuming final gate is X16

| $g_i$           | 1    | 5/3  | 4/3  | 1     |
|-----------------|------|------|------|-------|
| $h_i$           | 5/3  | 4/5  | 48/4 | 40/48 |
| $g_i \cdot h_i$ | 5/3  | 4/3  | 16   | 40/48 |
| $p_i$           | 1    | 2    | 2    | 1     |
| $d_i$           | 2.67 | 3.33 | 18   | 1.83  |
| D               |      |      |      | 25.83 |

#### Q1: How should we size gates to minimize total delay?

- Independent variables are  $h_i$  (i.e., internal gate sizing)
- We want to choose *h<sub>i</sub>* to minimize *D*
- Take the partial derivative of D with respect to  $h_i$ , set to zero, and solve for optimum  $h_i$

$$D = (g_1h_1 + p_1) + (g_2h_2 + p_2)$$

 Note that h<sub>1</sub> and h<sub>2</sub> are constrained since C<sub>1</sub> and C<sub>3</sub> are given and input cap of gate 2 is output cap for gate 1

$$h_1 = \frac{C_2}{C_1}$$
  $h_2 = \frac{C_3}{C_2}$   $h_1 h_2 = \frac{C_2}{C_1} \frac{C_3}{C_2} = \frac{C_3}{C_1}$ 

- Let  $H = h_1 h_2 = C_3 / C_1$ , H is a constant since  $C_1$  and  $C_3$  are given
- Let's rework *D* to get it in terms of just one variable

$$D = (g_1h_1 + p_1) + (g_2h_2 + p_2)$$

$$D = g_1h_1 + g_2h_2 + (p_1 + p_2)$$

$$= g_1h_1 + g_2\frac{H}{h_1} + (p_1 + p_2)$$

$$= g_1h_1 + g_2Hh_1^{-1} + (p_1 + p_2)$$

• Take partial derivative with respect to the only variable  $h_1$ 

$$D = g_1 h_1 + g_2 H h_1^{-1} + (p_1 + p_2)$$
$$\frac{\partial D}{\partial h_1} = g_1 - g_2 H h_1^{-2} + 0$$
$$= g_1 - \frac{g_2 H}{h_1^2}$$

• Set partial derivative to zero and solve for *h*<sub>1</sub>

$$\frac{\partial D}{\partial h_1} = g_1 - \frac{g_2 H}{h_1^2} = 0$$

$$g_1 = \frac{g_2 H}{h_1^2}$$

$$g_1 h_1^2 = g_2 H$$

$$h_1^2 = \frac{g_2}{g_1} H$$

$$h_1 = \sqrt{\frac{g_2}{g_1} H}$$

- Can use similar approach to find optimal  $h_i$  for more than 2 stages
- However, there is actually a much more interesting result!

$$g_1h_1^2 = g_2H$$

$$g_1h_1^2 = g_2h_1h_2$$

$$g_1h_1 = g_2h_2$$

$$f_1 = f_2$$

- Delay is minimized when stage effort  $(f_i)$  is the same in both stages!
- Let  $\hat{f}$  be the optimal stage effort (i.e.,  $\hat{f} = f_1 = f_2$ )
- We can use a trick to quickly calculate  $\hat{f}$

$$\widehat{f} = \sqrt{\widehat{f}^2} = \sqrt{\widehat{f} \, \widehat{f}} = \sqrt{f_1 \, f_2}$$
$$= \sqrt{(g_1 h_1)(g_2 h_2)}$$
$$= \sqrt{(g_1 g_2)(h_1 h_2)}$$

- Let  $G = g_1g_2$ , this is the path logical effort
- Let  $H = h_1 h_2 = C_{out} / C_{in}$ , this is the path electrical effort
- Let F = GH, this is the path effort

$$\widehat{f} = \sqrt{(g_1g_1)(h_1h_2)}$$
$$= \sqrt{GH}$$
$$= \sqrt{F}$$

- We can calculate  $\hat{f}$  without finding the optimal size of each gate!
- Minimal delay with optimal sizing can be quickly calculated using:

$$\widehat{D} = 2\widehat{f} + (p_1 + p_2)$$

• This generalizes to paths with any number of stages

$$G = \prod g_i$$
 path logical effort  $H = \prod h_i = \frac{C_{out}}{C_{in}}$  path electrical effort  $\widehat{f} = GH$  path effort optimal stage effort  $P = \sum p_i$  path parasitic delay  $\widehat{D} = N\widehat{f} + P$  min delay with opt sizing

### Method for optimal sizing

- 1. Calculate path effort (F = GH)
- 2. Calculate effort for each stage ( $\hat{f} = F^{1/N}$ )
- 3. Estimate minimum delay with optimal sizing  $(\hat{D} = N\hat{f} + P)$
- 4. Starting with last stage, work backwards sizing each gate

$$\hat{f} = gh = g\frac{C_{out}}{C_{in}} \qquad C_{in} = \frac{g}{\hat{f}}C_{out}$$

## Revisit earlier example



#### Optimal sizing with standard cells

- This assumes we can size gates arbitrarily using full custom design
- What about if we are using a standard cell library?
- Assume we have a standard cell library with the following cells
  - INVX1, INVX2, INVX4, INVX8
  - NANDX1, NANDX2, NANDX4
  - NORX1, NORX2, NORX4



- Assume we have determined optimal sizing in C<sub>in</sub>
- How do we figure out which standard cell to use?

• Given optimum  $C_{in}$  from before, what is  $\alpha$ ?

| $C_{in}$ | 8   | $C_{in}/(g \times 3C)$    | = α    | gate   |
|----------|-----|---------------------------|--------|--------|
| 17.17C   | 1   | $17.17C/(1 \times 3C)$    | = 5.72 | INVX4  |
| 9.83C    | 4/3 | $9.83C/((4/3) \times 3C)$ | = 2.45 | NANDX2 |
| 7.03C    | 5/3 | $7.03C/((5/3) \times 3C)$ | = 1.41 | NORX1  |
| 3.02C    | 1   | 3.02C/(1 × 3C)            | = 1.00 | INVX1  |

- Recalculate actual delay given these gates
- First calculate actual C<sub>in</sub> for each standard cell gate

| gate   | α | 8   | $\alpha \times g \times 3C = C_{in}$ |
|--------|---|-----|--------------------------------------|
| INVX4  | 4 | 1   | $4 \times 1 \times 3C = 12C$         |
| NANDX2 | 2 | 4/3 | $2 \times 4/3 \times 3C = 8C$        |
| NORX1  | 1 | 5/3 | $1 \times 5/3 \times 3C = 5C$        |
| INVX1  | 1 | 1   | $1 \times 1 \times 3C = 3C$          |

• Now use path delay equation

$$D = \sum gh + \sum p$$

$$= (1 \times \frac{40}{12}) + (\frac{4}{3} \times \frac{12}{8}) + (\frac{5}{3} \times \frac{8}{5}) + (1 \times \frac{5}{3}) + (1 + 2 + 2 + 1)$$

$$= 3.33 + 2 + 2.67 + 1.67 + 6 = 9.67 + 6 = 15.67$$

• Compare with optimal delay which is 15.32, off by 2.3%

#### What about branching?

• Consider the following simple example



- So in this example F = 2GH
- The factor of two is called the branching effort
- Key Idea: some drive current is directed off path we are analyzing
- Similar to Elmore delay for trees

$$b = \frac{C_{\mathrm{onpath}} + C_{\mathrm{offpath}}}{C_{\mathrm{onpath}}}$$
 stage branching effort  $B = \prod b_i$  path branching effort

• So our new path effort equation is now:

$$F = \prod f_i = GBH$$

- Note that path effort depends on circuit topology and loading of entire path, but not size of transistors in network
- Note that path effort does not change if we add or remove inverters!

### Q2: How should we change topology to minimize delay?

- Assume we want to implement an eight input AND gate
- Calculate min delay assuming optimal sizing for three topologies
- First assume H = 1, then assume H = 12



|          |            | H = 1 |               |            | H = 12 |   |
|----------|------------|-------|---------------|------------|--------|---|
| Topology | $NF^{1/N}$ | P     | $\widehat{D}$ | $NF^{1/N}$ | P      | D |
| NAND8    |            |       |               |            |        |   |
| NAND4    |            |       |               |            |        |   |
| NAND2    |            |       |               |            |        |   |

#### Determine optimal number of stages for chain of inverters



$$\widehat{D} = NF^{1/N} + NP_{inv}$$

$$\frac{\partial \widehat{D}}{\partial N} = F^{1/N} - F^{1/N} \ln(F^{1/N}) + P_{inv} = 0$$

• If  $P_{inv} = 0$ 

$$\frac{\partial \widehat{D}}{\partial N} = F^{1/N} - F^{1/N} \ln(F^{1/N}) = 0$$

$$\ln(F^{1/N}) = 1$$

$$F^{1/N} = e$$

$$\widehat{f} = e$$

- So if we assume  $P_{inv} = 0$ , then the optimal number of stages results in a stage effort of e (i.e., 2.718) for every stage
- Since G = 1 for an inverter, this means h = 2.718 for every stage

• If  $P_{inv} = 1$ , then we need to solve this nonlinear equation:

$$F^{1/N} - F^{1/N} \ln(F^{1/N}) + 1 = 0$$

• Let  $\rho = F^{1/\hat{N}}$  where  $\hat{N}$  is optimal number of stages

$$1 + \rho(1 - \ln(\rho)) = 0$$

- We can solve this numerically to find that  $\rho \approx 3.59$
- So if we assume  $P_{inv} = 1$ , then the optimal number of stages results in a stage effort of 3.59 for every stage
- Since G = 1 for an inverter, this means h = 3.59 for every stage
- We can roughly approximate 3.59 to be 4
- Let's solve for  $\widehat{N}$  as a function of F

$$F^{1/\widehat{N}} = 4$$

$$\log(F^{1/\widehat{N}}) = \log(4)$$

$$\frac{1}{\widehat{N}}\log(F) = \log(4)$$

$$\widehat{N} = \frac{\log(F)}{\log(4)} = \log_4(F)$$

 This is actually a pretty good estimate even for a path of gates which are not inverters! Logical affort can nelp give is intuition on now to size gates + anose a topology to minimize relay but it was many intrations.

to near wim more compliques scenarios we can also write me nelay earanous for each gave in system and minimum me latest arrival time.



(et's write our linear Rolly equation as a function of or

$$d = gh + \rho$$

$$g = \frac{R_7 C_7}{\sqrt{m} C_{mv}}$$

$$C_w = d_1 C_+ C_+ = \frac{C_{12}}{d}$$

$$d = \frac{C_{12}}{\sqrt{C_{mv}} C_{mv}}$$

$$g = \frac{C_{12}}{d C_{mv}}$$

Now write nearly economy for early stage

$$\frac{d_0}{-0} = \frac{1}{-0} = \frac{1}{-0}$$

$$t_0 = d_0$$
  
 $t_1 = Mx(t_0, d_{1N}) + d_1$   
 $t_2 = Mx(t_0, t_1) + d_2$   
 $t_3 = t_2 + d_3$   
 $t_3 = Max(t_0, t_1) + d_2 + d_3$   
 $t_3 = Max(d_0, Max(d_0, d_{1N}) + d_1) + d_2 + d_3$ 

t3 = Mx ( do, Mx (do, du) + d, ) + d2 + d3

MINIMIRE & SUBJECT TO Above CONSTRAINTS WITH

Acrosly is synthesis we really wast to mismile ANKA (on everyy) Subject to constraint on to.

som of d, dr dr (prox for man) subject to constraint:

tilk > max ( do, max (do, din) +din) +dz + dz

Clock Penso Gustrawt

## 3. Energy

- Energy is a measure of work
- Power is the rate at which work is done



| Electric  |
|-----------|
| Potential |

Capacity for doing work

**Ioules** 

Volts

Energy

which arises from

position of a charge in an

electric field

charge

Electric **Potential**  Electric potential energy of a position per unit

1V = 1I/C

Current

Rate at which charge flows past position

 $\Delta V = \Delta E/O$ Amps

1A = 1C/S $I = O/\Delta t$ 

Power

Rate at which electric energy is supplied or consumed

Watts 1W = 1J/S

 $P = \Delta E / \Delta t = \frac{\Delta V \cdot Q}{Q/I} = VI$ 



#### **Energy Stored on a Capacitor**

$$E_{C} = \int_{0}^{\infty} P(t)dt = \int_{0}^{\infty} V(t)I(t)dt$$

$$= \int_{0}^{\infty} V(t)\frac{dQ}{dt}dt = \int_{0}^{\infty} V(t)\frac{Cdv}{dt}dt$$

$$= C\int_{0}^{V_{DD}} V(t)dV = \frac{1}{2}CV_{DD}^{2}$$

- So on  $1 \to 0$  input transition,  $\frac{1}{2}CV_{DD}^2$  is stored on capacitor
- This energy is released on  $0 \rightarrow 1$  input transition

#### **Energy Delivered From Power Supply**

$$E_{\text{supply}} = \int_0^\infty P(t)dt = \int_0^\infty V_{DD}I(t)dt$$

$$= V_{DD} \int_0^\infty \frac{dQ}{dt}dt = V_{DD} \int_0^\infty \frac{Cdv}{dt}dt$$

$$= CV_{DD} \int_0^{V_{DD}} dV = CV_{DD}^2$$

- $0 \rightarrow 1$  output transition
  - $CV_{DD}$ <sup>2</sup> energy is delivered from power supply
  - half this energy dissipated as heat in PMOS
  - half this energy is stored on the capacitor
- $1 \rightarrow 0$  output transition
  - no energy is delivered from power supply
  - remaining energy on capacitor dissipated as heat in NMOS

- On average, each bit transition requires  $\frac{1}{2}CV_{DD}^2$
- Let  $\alpha$  be the activity factor, probability of a bit transitions per cycle

$$E_{\text{node}} = \alpha \frac{1}{2} C V_{DD}^2$$

#### **Power Consumption**

$$P_{\text{tot}} = P_{\text{switching}} + P_{\text{static}}$$
$$= \alpha f \frac{1}{2} C V_{DD}^{2} + V_{DD} I_{\text{off}}$$

- Sometimes engineers will assume  $\alpha$  is the probability of just a  $0 \to 1$  output transition instead of the probability of any transition
  - $-\alpha$  = probability of any transition
  - $\alpha'$  = probability of a  $0 \rightarrow 1$  transition only
- If you use  $\alpha'$  then do not include the factor of 1/2
- Note that book uses  $\alpha$  but it is really  $\alpha'$  in our notation!

### **Comparing Energy**

• Calculate the total switched cap in worst case



 To determine parasitic cap need to understand how gate cap is distributed across transistors

#### inverter

#### B WOUT NAMS GATE

### 2 WAUT NON GATE

## YIMPUT NAMO GATE

$$E_{\text{node}} = \alpha \frac{1}{2} C V_{DD}^2$$

- Assume  $\alpha = 0.1$  and  $V_{DD} = 1$ V for both
- Only difference is amount of switched cap
- For 8-input NAND topology

$$C_{\text{tot}} = C_{\text{tot},g} + C_{\text{tot},p} = 88.8 + (5.6 + 24.96) = 119.36C$$

• For 4-input NAND topology

$$C_{\text{tot}} = C_{\text{tot},g} + C_{\text{tot},p} = 101 + (11.16 + 2 \times 20.58) = 153.32C$$

- So second topology requires  $\approx$ 30% more energy in the *worst* case
- Worst case is when all capacitance is switched
- This ignores the energy for switching the output load
- Let's assume C = 0.5 fF (see extra notes)
- Assume clock frequency is 500 MHz

$$E = \alpha \frac{1}{2}CV_{DD}^{2} = 0.1 \times \frac{1}{2} \times 120C \times \frac{0.5fF}{C} \times (1V)^{2} = 3fJ$$

$$P = \alpha f \frac{1}{2}CV_{DD}^{2} = (0.5 \times 10^{9})(30 \times 10^{-15}) = 1.5\mu\text{W}$$

#### **Activity Factors**

- Previous example used fixed  $\alpha = 0.1$  for all nodes
- Can improve accuracy by:
  - Propagate activity factor of inputs to internal nodes
  - Use RTL to calculate activity of inputs, then propagate
  - Use gate-level simulation to find activity of each node





 $P_i = ext{probability node}$  is one on cycle i  $\overline{P}_i = 1 - P_i = ext{probability node}$  is zero on cycle i  $\alpha = P_{i-1}\overline{P}_i + \overline{P}_{i-1}P_i$   $\alpha' = \overline{P}_{i-1}P_i$ 

- Assuming inputs have uncorrelated random data
- Each of these is equally likely:  $0 \rightarrow 0$ ,  $0 \rightarrow 1$ ,  $1 \rightarrow 0$ ,  $1 \rightarrow 1$

$$\alpha = P_{i-1}\overline{P}_i + \overline{P}_{i-1}P_i = 0.5$$

$$\alpha' = \overline{P}_{i-1}P_i = 0.25$$

$$\alpha' = \frac{1}{2}\alpha$$

#### **Output Activity Factor of NAND2**

- Calculate output activity factor of a NAND2 gate
- Assume inputs are uncorrelated random data
- Output of NAND2 is zero if both inputs one, otherwise output is one

$$\alpha'_{out} = \overline{P}_{out,i-1} P_{out,i}$$

$$= (P_A P_B) (1 - P_A P_B)$$

$$= (0.5 \times 0.5) (1 - 0.5 \times 0.5)$$

$$= (0.25) (1 - 0.25)$$

$$= 0.1875$$

#### **Output Activity Factor of NAND8**



#### 4. Area

- Sum the transistor widths across all transistors in design
- Use standard cell footprints