Design of CMOS Adaptive Supply Serial Links

Jaeha Kim

Computer Systems Laboratory
Stanford University
High-Speed CMOS Links

- Essential in bridging the gap between on-chip computation and chip-to-chip communication
What Makes a Link High-Speed

- Terminated channel
- High-bandwidth transceiver (TX, RX)
- Precise timing control
Design Trends

- Speed has been the primary objective
  - Especially if the channel is costly

- Power has also become critical
  - Hundreds of links on a single chip
  - E.g.: multiprocessors, network routers, etc.
  - Power sets the maximum number of links on a chip

- Moving from parallel links towards serial links
  - Pin counts, skews between channels, etc.
My Research Goals

- Build low-power links for high integration
  - Serial links with per-pin clock recovery

- The key is adaptive power-supply regulation
  - Reduce power by lowering the supply voltage

- Exploit parallelism to improve speed at low voltage

- Find a balance between speed and power
Outline

- Introduction
- High-level approaches for low power
  - Adaptive power-supply regulation
  - Parallelism in links
- Enabling circuit techniques
  - Parallelized transceiver operating at low voltages
  - Per-pin multiphase clock generation and recovery
- Measurement results
- Conclusions
Adaptive Power-Supply Regulation

- Originally proposed for digital systems, e.g. μPs
- Allows the trade-off between speed and power

\[ f \sim V \]

\[ P \sim V^2 f \]

Jaeha Kim
Generating the “Adaptive Supply”

- Find the min voltage that meets the speed
- Need a reference circuit that indicates the max performance \( f \) at supply \( V \)
Generating the “Adaptive Supply” (2)

- Min. clock period for links ≈ 6~8 FO4 inv delays
- Use buck converter for high efficiency (>90%)
Generating the “Adaptive Supply” (2)

- Min. clock period for links $\approx 6$~$8$ FO4 inv delays
- Use buck converter for high efficiency ($>90\%$)
Digital Sliding Controller

- Has faster transient than a linear controller
- Controller power scales as $V^2 \cdot f_{\text{ref}}$
  - Controller operates at $f_{\text{ref}}$ and at adaptive supply, $V$

![Diagram of Digital Sliding Controller, Buck Converter, and Adaptive Supply, V]
More Benefits from Adaptive Supply

- Adaptive supply compensates for process and temperature variations
  - reduces the design margins from die-to-die variation to within-die variation

- Adaptive supply tracks the clock frequency
  - Many link parameters have to scale with frequency
  - Typically done by analog circuits with bias generators
  - With adaptive supply, simple digital circuits can do
How to Get Performance Back

- Power saving comes with performance loss
  - By operating at low supply and low clock frequency

- Parallelism can boost the speed at low voltage
  - Time-interleaved multiplexing
  - Multi-level signaling

```
clk[n]                        10
clk[n+1]                      11
data                          01
                               00
```

Jaeha Kim
Parallelism: Multiplexing

Parallelized Transmitter

Parallelized Receiver

channel

bitrate = M \cdot f_{clk}

Multiphase Clocks
Multiplexing with Adaptive Supply

- With M:1 multiplexing, \( f = \frac{b}{M} \)
  
  \[
  \text{Power} = M \cdot CV^2 f = M \cdot CV^2 \cdot \left(\frac{b}{M}\right) = CV^2 b
  \]

- With fixed supply, power does not vary with M

- With adaptive supply (\( V \propto f = \frac{b}{M} \)),
  
  \( \Rightarrow \) Power decreases as \( \propto \frac{1}{M^2} \)
Power vs. Timing Accuracy

- Larger M
  - Lower power
  - Less accurate timing
    - static phase offset
    - jitter
- Choice: M=5
Adaptive Supply Serial Links

Multiphase Clock Recovery

1:5 Demultiplexing Receiver

Multiphase Clock Generation

5:1 Multiplexing Transmitter

Adaptive Supply, V

Adaptive Power Supply Regulator

$f_{\text{ref}}$
Low-Voltage Parallelized Transceiver

Multiphase Clock Recovery

1:5 Demultiplexing Receiver

Adaptive Supply, V

5:1 Multiplexing Transmitter

Multiphase Clock Generation

Adaptive Power Supply Regulator

$f_{\text{ref}}$
Output swing scales as \((V-V_{th})^\alpha\)

Output pulse-width narrows as \(V\) drops
Proposed Transmitter

- Predriver output swing is shifted down by $V_{th}$
- Effectively makes zero-$V_{th}$ pMOS driver
Level-Shifting Predriver
Effects of Level-Shifting
Demultiplexing Receiver

Integrating Front-End

\[ \Psi[4]_{\text{rise}} \sim \Psi[0]_{\text{rise}} \]
\[ \Psi[0]_{\text{rise}} \sim \Psi[1]_{\text{rise}} \]
\[ \Psi[3]_{\text{rise}} \sim \Psi[4]_{\text{rise}} \]

Comparator

\[ \Psi[0] : \text{Multiphase Clocks} \]
\[ \Psi[1] \]
\[ \Psi[4] \]

D[0]
D[1]
D[4]
Integrating Front-End [Sidiropoulos:96]

- Sample-and-hold switches M3 and M4 limit the minimum supply
Proposed Integrating Front-End

- Integrate-and-hold without sampling switches
Low-Voltage Comparator
Comparator Sensitivity
Multiphase Clock Generation

- Must minimize static offsets between phases
- Generate multiphase clocks locally at each pin, but watch out for power and area overhead
Adaptive-Supply PLLs

- Adaptive supply maximizes energy-efficiency
- Scales PLL bandwidth with the frequency, $f_{\text{ref}}$

Jaeha Kim
Dual-Loop Architecture

Adaptive Power Supply Regulator

$f_{ref}$

Coarse Control

Reference VCO

$f$

Local Multiphase Clock Recovery

Fine Control

Local VCO

1:5 Demultiplexing Receiver

Adaptive Supply, $V$

Local Multiphase Clock Generation

Fine Control

Local VCO

5:1 Multiplexing Transmitter

Jaeha Kim
Dual-Loop Architecture (2)

- Global loop brings the local VCO frequency close to lock

- Narrow local tuning range (+/-15%) is sufficient to compensate for on-chip mismatches

- Narrow tuning range leads to low VCO gain
  - Small loop capacitor area (2.5pF)
  - Low sensitivity on Vctrl noise
Voltage-Controlled Oscillator (VCO)
Voltage-Controlled Oscillator (VCO)
Voltage-Controlled Oscillator (VCO)
Voltage-Controlled Oscillator (VCO)
Voltage-Controlled Oscillator (VCO)
Filtering Noise on VCO Supply

- To reduce VCO clock jitter
- Sacrifices power since it scales as $V \cdot f$, not $V^2f$
Fine-Tuning of VCO Frequency

A. Variable Load

B. Supply Offset
Fine-Tuning Range
Clock Recovery in Serial Links

- Optimal receiver timing is recovered from the incoming data stream
Phase Detector

Phase detector made of an identical set of receivers minimizes the timing error.
Bangbang PLL Dynamics [Walker:92]

- Proportional gain >> integral gain
- Behaves as a first-order loop when locked
Limited Lock Range

![Graph showing phase detector gain vs. initial frequency difference](image)

- **Lock Range**
- **Fine-Tuning Range (±15%)**

Jaeha Kim
Frequency Acquisition Aid

- Sweeps the VCO frequency until the PLL finds a locking point
Prototype Chip

- 0.25µm CMOS
- 2.5V / 0.55Vth
- 3.1×2.9mm²
- 0.4~5.0Gb/s
- 0.9~2.5V
- 5.6~375mW
- BER < 10^{-15}
- Reg. Efficiency: 83-94%
Performance and Power
Performance and Power

3.1 Gb/s, 113 mW
Power Breakdown

- Transmitters (TX) 22%
- Receivers (RX) 14%
- RX Clock Buffers 22%
- RX PLL 6%
- TX PLL 5%
- TX Clock Buffers 30%
## Eye Diagram

<table>
<thead>
<tr>
<th>Eye</th>
<th>Width</th>
<th>Height</th>
</tr>
</thead>
<tbody>
<tr>
<td>1st</td>
<td>76.4%</td>
<td>59.0%</td>
</tr>
<tr>
<td>2nd</td>
<td>72.2%</td>
<td>59.5%</td>
</tr>
<tr>
<td>3rd</td>
<td>76.4%</td>
<td>60.0%</td>
</tr>
<tr>
<td>4th</td>
<td>73.6%</td>
<td>60.0%</td>
</tr>
<tr>
<td>5th</td>
<td>66.7%</td>
<td>59.5%</td>
</tr>
</tbody>
</table>

Aggregate Eye Opening:
- Width : 66.7%
- Height : 55.1%
Conclusions

- Serial links can achieve low power via adaptive supply and high speed via parallelism

- Supporting circuit techniques:
  - Digital sliding controller for adaptive supply regulation
  - Low-voltage parallelized transceiver
  - Dual-loop multiphase clock generation and recovery

- Explored trade-offs between speed and power
  - Found that the optimal supply is around $3 \times V_{th}$
Acknowledgements

- Prof. Mark Horowitz
- Prof. Bill Dally, Prof. John Gill, Dr. John Maneatis
- ASCI, National, Vitesse, Nassda, Synopsys, etc.
- Terry, Deborah, Taru, Penny, Pamela, Ann, Kate
- MH group members: Gu, Stefanos, Dean, Jeff, Bennett, Dan, Birdy, Evelina, Elad, Haechang, Ken M/Y/C, Ron, Bill, Ricardo, David H/L, Azita, Vladimir, Sam, Amin, Francois, Kelin, Paul, Alex, Mai, Vicky
- CIS buddies: Ali, Hirad, Mohan, Mar, David, Andrew, Joel, Shwe, Lalit, Jeff, Greg, Won
- Korean friends: Sangeun, Youngjune, Jin, Donghyun, Jaewon, Wonjoon, Wonjong, Sukhwan, and lot more!
- My parents and my lovely wife, Soo