# RRAM-ECC: Improving Reliability of RRAM-Based Compute In-Memory

Zishen Wan<sup>\*1</sup>, Brian Crafton<sup>\*1</sup>, Sam Spetalnick<sup>1</sup>, Jong-Hyeok Yoon<sup>2</sup>, Arijit Raychowdhury<sup>1</sup>

Georgia Institute of Technology<sup>1</sup> Daegu Gyeongbuk Institute of Science and Technology<sup>2</sup> \*Equal Contributions

⊠zwan63@gatech.edu, bcrafton3@gatech.edu







# Agenda

- 1. Motivation & Background
- 2. RRAM + CIM Measurement
- 3. CIM-SECDED
- 4. Successive Correction
- 5. Summary

# Agenda

- 1. Motivation & Background
- 2. RRAM + CIM Measurement
- 3. CIM-SECDED
- 4. Successive Correction
- 5. Summary

#### **Advantages**

- 1. Increase Memory Bandwidth
- 2. Multiply-Accumulate on BL

#### **Features**

- 1. Multiple WLs
- 2. Multiply & Accumulate on BL

## **Implications**

- 1. Compute In-Memory  $\rightarrow$
- 2. Matrix Multiplication  $\rightarrow$
- 3. Deep Learning & Al













# Compute In-Memory

#### Advantages:

- 1. ↑Bandwidth (N×)
- 2. Less communication (N  $\rightarrow$  Log<sub>N</sub>)
- 3. "Free" Compute (N×)

#### **Challenges:**

- 1.  $\uparrow$ Bits  $\rightarrow \uparrow$ Noise  $\rightarrow \uparrow$ Error
- 2.  $\uparrow$ Bits  $\rightarrow \downarrow$ Headroom  $\rightarrow \uparrow$ Error



## **Dense Embedded (On-Chip) Memory**



| Memory   | SRAM      | DRAM      | RRAM         |
|----------|-----------|-----------|--------------|
| Latency  | Very Fast | Fast      | Fast         |
| Power    | Low       | Medium    | Low          |
| Volatile | Volatile  | Volatile  | Non-Volatile |
| Density  | Low       | Very High | High         |

## **Challenges for RRAM + CIM**

## RRAM

- HRS:  $30K\Omega \rightarrow 0$
- LRS:  $3K\Omega \rightarrow 1$

# **Challenges for CIM**

- Accumulate variation
- Reduced sense margin

# ... but no ECC!



# Agenda

- 1. Motivation & Background
- 2. RRAM + CIM Measurement
- 3. CIM-SECDED
- 4. Successive Correction
- 5. Summary

## **Die Shot & PCB**



## **Measurements: Variation**

#### Experiment

- 8192 measurements
- Resistance distributions (CDF)

#### **Observations**

- $\uparrow$ Write Voltage  $\rightarrow \downarrow$ Variation
- $\uparrow$ Write Voltage  $\rightarrow \uparrow$ Ratio
- ... and lower endurance



## **Measurements: BER**

#### Experiment

- 8192 measurements
- Confusion matrix

Observations

- $\uparrow$  Variation  $\rightarrow \uparrow$  CIM Error Rate
- $\uparrow LRS \rightarrow \uparrow CIM$  Error Rate



# Agenda

- 1. Motivation & Background
- 2. RRAM + CIM Measurement
- 3. CIM-SECDED
- 4. Successive Correction
- 5. Summary

## **Macro Level Implementation**

## **Specifications**

- 256×256 RRAM Array
- 8 WL/cycle, 3b ADC
- Shift + Add logic (VMM)
- ECC (SECDED)

## <u>ECC</u>

- [32, 8] SECDED code
  - 64 Check bits
- SECDED decoder



A 40nm 64Kb 56.67TOPS/W Read-Disturb-Tolerant Compute- in-Memory/Digital RRAM Macro, Yoon et al, ISSCC 2021
 CIM-SECDED: A 40nm 64Kb Compute In-Memory RRAM Macro with ECC Enabling Reliable Operation, Crafton et al. ASSCC 2021

## 2 Key Observations

- Only ±1 errors from ADC
- Residue arithmetic











## **CIM-SECDED** Decoder

## Implementation:

- Encoder → Compiler
- Decoder → Digital logic

## Architecture:

- XOR Tree + DED (SECDED)
- Adder Tree + LUT (NEW)



## **CIM-SECDED** Overhead

- 1 Extra parity bit (32/8 vs 32/7)
- 16.8% Area overhead
- 31.8% Power Overhead



## **CIM-SECDED** Results

- 100× reduction in BER
- 3.9% and 16.3% accuracy improvement



| Ta          | ısk       | Variati            | on (%)                       | Accuracy | Loss (%) |   |
|-------------|-----------|--------------------|------------------------------|----------|----------|---|
| Dataset     | Network   | WR Voltage         | $\sigma_{LRS}$ / $\mu_{LRS}$ | No ECC   | ECC      |   |
| ImageNet Re | ResNet18  | $V_{\rm BL} = 1.9$ | 3.7%                         | 3.9%     | 0%       |   |
|             | Resiletto | $V_{\rm BL} = 1.7$ | 7.1%                         | 16.7%    | 0.4%     | ← |

# Agenda

- 1. Motivation & Background
- 2. RRAM + CIM Measurement
- 3. CIM-SECDED
- 4. Successive Correction
- 5. Summary

## **Can We Do Better ?**

## **Observation:**

- CIM-SECDED:  $10^{-3} \rightarrow 10^{-6}$
- SRAM: 10<sup>-15</sup>
- DNNs  $\to 10^{-5}$  [1]

| Memory | BER   |
|--------|-------|
| SRAM   | 1e-15 |
| CIM    | 1e-3  |

## Experiment:

- DEC  $\rightarrow 10^{-9}$
- TEC  $\rightarrow 10^{-12}$

| ECC       | BER                         |
|-----------|-----------------------------|
| CIM + SEC | (1e-3) <sup>2</sup> = 1e-6  |
| CIM + DEC | (1e-3) <sup>3</sup> = 1e-9  |
| CIM + TEC | (1e-3) <sup>4</sup> = 1e-12 |

[1] Ares: A framework for quantifying the resilience of deep neural networks, DAC 2018, Reagen et al.

- <u>Observation:</u>
  1. ↓WL → ↓BER
  2. Can detect 2 errors
   SEC**DED**
- Idea:
- Read 4 WL
- Detect error ?
- Read 2 WL

Result:

•  $\mathsf{DED} \to \mathsf{DEC}$ 



[1] Improving compute in-memory ecc reliability with successive correction. Crafton et al. DAC 2022.

#### **Observation:**

- SECDED  $\rightarrow$  Hamming distance of 4
  - SECDED

• TED

Result: • TED → TEC



[1] Improving compute in-memory ecc reliability with successive correction. Crafton et al. DAC 2022.

Implementation:

- DEC  $\rightarrow$  Exact same as SECDED
- TEC  $\rightarrow 0.1\%$  Area (10um<sup>2</sup>)







# Agenda

- 1. Motivation & Background
- 2. RRAM + CIM Measurement
- 3. CIM-SECDED
- 4. Successive Correction
- 5. Summary

## Summary

- $\Box \quad CIM: \uparrow WL \rightarrow \uparrow Noise + \downarrow Voltage Range \rightarrow \uparrow BER$
- $\Box \quad \text{Detect error} \rightarrow \downarrow \text{WL} \rightarrow \downarrow \text{BER}$
- □ >16,000× improvement in BER over No ECC
- □ 636× ↓ BER @ 5.7% ↓ performance over SOTA

| ECC    | BER                         |
|--------|-----------------------------|
| No ECC | 1e-3                        |
| SEC    | (1e-3) <sup>2</sup> = 1e-6  |
| DEC    | (1e-3) <sup>3</sup> = 1e-9  |
| TEC    | (1e-3) <sup>4</sup> = 1e-12 |