# Circadian Rhythm: A Candidate for Achieving Everlasting Flash Memories

M. Ceylan Morgul<sup>\*</sup>, Xinfei Guo<sup>†</sup>, and Mircea Stan<sup>\*</sup>

\*Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA

{mm4uz, mircea}@virginia.edu

<sup>†</sup>University of Michigan – Shanghai Jiao Tong University Joint Institute,

Shanghai Jiao Tong University, Shanghai, China

xinfei.guo@sjtu.edu.cn

Abstract—The existing passive (resting) and the accelerated passive (thermal annealing) self-healing techniques were presented for flash memory's low endurance limitation. Yet, they have been utilized at the end (or near the end) of the lifetime of flash. This approach has left the permanent component of the damages unchecked since they can only recover temporary damage. If not recovered timely, the damages accumulate and become permanent. In this study, we propose implementing a Circadian Rhythm (CR) (as an analogue of nature) recovery technique to target the prevention of permanent damages. Our measurement results show that the most frequent rhythm, compared to the least frequent rhythm, slows down the speed of occurrences of the Byte Error Rate by around 70 times. Moreover, it shows a more flat and linear error occurrence trend since the CR technique prevents most of the permanent damages. The observed behavior in flash chips opens the opportunity of having everlasting flash memories by implementing Circadian Rhythm into Flash Transition Layer (FTL).

Index Terms—circadian rhythm, endurance, flash memory, recovery, reliability

### I. INTRODUCTION

Flash memories are widely used in various applications due to their low cost, high performance and non-volatility yet they all face the endurance limitation of flash memories [2]. There will be a limit of program-erase (P/E) operation cycles that a flash memory can operate without giving any error. There are device-level reliability improvement techniques to heal the worn-out device and/or to reduce the stress of operations. The healing process can be initiated by simply allowing flash memories to rest [3], and it can be accelerated by increasing the temperature (thermal annealing) [4].

In this study, we analyze the effect of different rhythms (frequency and duration) of the applied recoveries on the reliability of the flash devices. The core insight is taken from nature's circadian rhythm, where almost all animals sleep/rest every day before they start to show fatal symptoms of permanent fatigues (mentally or physically) [5]. Similarly, in the Circadian Rhythm (CR) of electronics, devices are put into recovery mode early so that temporary damages don't become accumulated and turn into unrecoverable damages. CR presents 100% recovery for Bias Temperature Instability (BTI)



Fig. 1. The block diagram of the Circadian Rhythm recovery approach.

of FETs and ElectroMigration (EM) wearout of interconnect [6].

We observed the CR recovery approach, illustrated in Fig. 1, drastically inhibits the occurrence of permanent damages in flash memories by preventing permanent damages. Details are presented in Section II. CR approach can be easily integrated into current system-level mitigation techniques, such as over-provisioning and wear-leveling [7], because of its algorithmic nature. Moreover, we plan to investigate the generalization of Circadian Rhythm to all electronics.

## II. RECOVERY WITH CIRCADIAN RHYTHM

We set our experiments in a temperature controlled environment, and perform with a 2D planar 21 nm SAMSUNG SLC NAND Flash memory (K9F1G08U0E), which we control with a STM32F103ZET6 microcontroller. SLC (single-level cell) flash memories are preferred in applications where reliability is crucial. Hybrid SSDs are proposed [8] to exploit the relatively high endurance of SLC and the high density of Multi-LC (MLC, TLC, QLC, etc.). As illustrated in Fig. 1, we perform a Read operation, which detects the errors for the last programmed data, after every 20 P/E cycles (PEC). We set dwell (Delay1) and retention (Delay2) times as 1.5 and 3 seconds, respectively. In the light of [1], we kept the temperature high as 95 °C for P/E cycles. We compare seven rhythms: rhythms of 1) [200k-10h] (i.e., 200 thousand PEC followed by 10 hours of recovery), 2) [25k-1h], 3) [25k-2h], 4) [50k-5h], 5) [5k-1h], 6) [500-1h] and 7) [120-15m].

Fig. 2 shows the comparison of the rhythms, and Table I presents the critical points of the experiment. When they are compared with the rhythm of [200k-10h], the rhythms increase the *lifetime* by 3% to 300%. Employment of Circadian Rhythm [120-15m] increases the *lifetime* of the flash by at least 82% and 285% without an Error Correction Code

This work was supported in part by Semiconductor Research Corporation (SRC) under the Center for Research on Intelligent Storage and Processingin-memory (CRISP). We want to thank Mohammad Nazmus Sakib for his valuable contributions. The preliminary results of this study are presented in [1] (doi.org/10.1109/IIRW53245.2021.9635624).



Fig. 2. *BER* statistics, showing Inclusive Median Quartile and Mean Line, for different circadian rhythms. Horizontal lines show Error Correction Code (ECC) limits. X-axis indicates the the range of 50k *PEC* previous to the given value for  $\leq$ 1M *PEC*, and 100k for >1M *PEC*. Due to the extensive duration of experiments, we ended the experiments when we observed meaningful data. Notice only outlier points of [102-15m] are shown with green circles.

| TABLE I                                                |
|--------------------------------------------------------|
| PEC where Error Starts Occurring, and $BER$ reaches to |
| ECC-1 and $ECC$ -4 for Different Circadian Rhythms     |

| Circadian Rhythm | Beginning of Error Occurrences | ECC-1 | ECC-4  |
|------------------|--------------------------------|-------|--------|
| 200k-10h         | 122k                           | 223k  | 442k   |
| 25k-1h           | 112k                           | 245k  | 531k   |
| 25k-2h           | 146k                           | 230k  | 549.7k |
| 50k-5h           | 131k                           | 264k  | 549.3k |
| 5k-1h            | 135k                           | 294k  | *      |
| 500-1h           | 222k                           | *     | *      |
| 120-15m          | < 427k                         | 858k  | *      |

\*: Experiments are ended before observing such byte error rate.

(ECC) and with ECC-1, respectively, compared to the rhythm of [200k-10h]. Furthermore, BER of Circadian Rhythm [120-15m] stays under 1-bit-ECC (ECC-1) threshold value almost all (only 0.06% of the cases exceed) the read operations throughout 2M *PEC*, which is around **9x** increase in lifetime compared to the rhythm of [200k-10h] -where *BER* starts to exceed ECC-1 at 223k.

Even though the Circadian Rhythm of [25k-1h] has less total recovery time than the rhythm of [200k-10h] in the duration of experiments, the rhythm of [25k-1h] results in less number of errors and around 2x less speed of increase in *BER*. Therefore, more frequent and early passive recovery cycles are much more efficient in preventing damages. [500-1h] and [120-15m] show similar characteristics; they have the same ratio of recovery and in-use durations. On the other hand, the most frequent rhythm ([120-15m]) slows down the speed of *BER* by **70 times** and flattens the trend, compared to the least frequent rhythm [200k-10h], when we apply linear fit to the data. It prevents most of the permanent damages.

#### **III. DISCUSSIONS AND FUTURE WORKS**

In the light of these promising results, Circadian Rhythm creates the opportunity of having everlasting flash memories by preventing permanent damages. Yet, it has the drawback of forbidding the use of a block for a specific time; this leads to the capacity or performance decrease. Nonetheless, Applications calling for high reliability will be looking for a higher lifetime—for instance, space applications where it is hard and costly to replace devices. Also, an application that only works at certain times is a natural candidate that can utilize the circadian rhythm. We plan to expand the study by analyzing memory workloads and requirements and implementing CR to various applications (e.g., IoT edge, autonomous vehicles, data centers, surveillance, etc.) to develop a methodology to optimize the frequency of the circadian rhythm based on their needs. Additionally, since CR is proven to be beneficial for up to 100% recovery for FETs [6], we plan to develop a system that exploits CR as a whole.

Furthermore, we plan to integrate CR into the Flash Transition Layer (FTL), where the system-level mitigation techniques, such as wear-leveling and overprovisioning, are implemented. The most basic way is to increase the capacity (similar to overprovisioning) by the ratio of recovery and in-use durations and use the blocks in shift (similar to the wear-leveling). For example, in the case of rhythm [120-15m], where 120 *PEC* took roughly 3.5m, we will need only 5x redundant blocks to get 9x lifetime improvement. This results 2x improvement in the sustainability.

#### REFERENCES

- M. C. Morgul, M. N. Sakib, and M. Stan, "Reliable processing in flash with high temperature," in 2021 IEEE International Integrated Reliability Workshop (IIRW). IEEE, 2021, pp. 1–6.
- [2] C. M. Compagnoni, A. Goda, A. S. Spinelli, P. Feeley, A. L. Lacaita, and A. Visconti, "Reviewing the evolution of the nand flash technology," *Proceedings of the IEEE*, vol. 105, no. 9, pp. 1609–1633, 2017.
- [3] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. R. Stan, "How i learned to stop worrying and love flash endurance." *HotStorage*, vol. 10, pp. 3–3, 2010.
- [4] H.-T. Lue, P.-Y. Du, C.-P. Chen, W.-C. Chen, C.-C. Hsieh, Y.-H. Hsiao et al., "Radically extending the cycling endurance of flash memory (to> 100m cycles) by using built-in thermal annealing to self-heal the stressinduced damage," in 2012 International Electron Devices Meeting. IEEE, 2012, pp. 9–1.
- [5] J. Orzel-Gryglewska, "Consequences of sleep deprivation." International journal of occupational medicine and environmental health, 2010.
- [6] X. Guo and M. R. Stan, Circadian Rhythms for Future Resilient Electronic Systems. Springer, 2020.
- [7] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, "Error characterization, mitigation, and recovery in flash-memory-based solid-state drives," *Proceedings of the IEEE*, vol. 105, no. 9, pp. 1666–1704, 2017.
- [8] S. Hong and D. Shin, "Nand flash-based disk cache using slc/mlc combined flash memory," in 2010 International Workshop on Storage Network Architecture and Parallel I/Os. IEEE, 2010, pp. 21–30.