STeP-CiM: Strain-enabled Ternary Precision Computation-in-Memory based on Non-Vo
Abstract
We propose 2D Piezoelectric FET (PeFET) based compute-enabled non-volatile memory for ternary deep neural networks (DNNs). PeFETs hinge on ferroelectricity for bit storage and piezoelectricity for bit sensing, exhibiting inherently amenable features for computation-in-memory of dot products of weights and inputs in the signed ternary regime. PeFETs consist of a material with ferroelectric and piezoelectric properties coupled with Transition Metal Dichalcogenide channel. We utilize (a) ferroelectricity to store binary bits (0/1) in the form of polarization (-P/+P) and (b) polarization dependent piezoelectricity to read the stored state by means of strain-induced bandgap change in Transition Metal Dichalcogenide channel. The unique read mechanism of PeFETs enables us to expand the traditional association of +P (-P) with low (high) resistance states to their dual high (low) resistance depending on read voltage. Specifically, we demonstrate that +P (-P) stored in PeFETs can be dynamically configured in (a) a low (high) resistance state for positive read voltages and (b) their dual high (low) resistance states for negative read voltages, without afflicting a read disturb. Such a feature, which we name as Polarization Preserved Piezoelectric Effect Reversal with Dual Voltage Polarity (PiER), is unique to PeFETs and has not been shown in hitherto explored memories. We leverage PiER to propose a Strain-enabled Ternary Precision Computation-in-Memory (STeP-CiM) cell with capabilities of computing the scalar product of the stored weight and input, both of which are represented with signed ternary precision. Further, using multi word-line assertion of STeP-CiM cells, we achieve massively parallel computation of dot products of signed ternary inputs and weights. Our array level analysis shows 91% lower delay and improvements of 15% and 91% in energy for in-memory multiply-and-accumulate operations compared to near-memory design approaches based on 2D FET based SRAM and PeFET respectively. We also analyze the system-level implications of STeP-CiM by deploying it in a ternary DNN accelerator. STeP-CiM exhibits 6.11× - 8.91× average improvement in performance and 3.2× average improvement in energy over SRAM based near-memory design. We also compare STeP-CiM to near-memory design based on PeFETs showing 5.67× - 6.13× average performance improvement and 6.07× average energy savings.
General metadata
Applied Science Letters
Volume- 01, Issue 04
DOI- Registering
Introduction
Deep Neural Networks (DNNs) have transformed the field of machine learning and are deployed in many real-world products and services (Lecun et al., 2015). However, enormous storage and compute
2 demands limits their application in energy-constrained edge-devices (Venkataramani et al., 2016). Precision reduction in DNNs has emerged as a popular approach for energy-efficient realization of hardware accelerators for these applications (Courbariauxécole and Bengio, 2015; Mishra et al., 2017; Choi et al., 2018; Colangelo et al., 2018; Wang et al., 2018). State-of-the-art DNN hardware for inference employs 8-bit precision, and recent algorithmic efforts have shown the pathway for aggressive scaling up to binary precision (Choi et al., 2018; Colangelo et al., 2018) . However, accuracy suffers significantly at binary precision. Interestingly, ternary precision networks offer a near-optimal design point in the low precision regime with significant accuracy boost compared to binary DNNs (Li et al., 2016; Zhu et al., 2016) and large energy savings with mild accuracy loss compared to higher precision DNNs (Mishra et al., 2017; Wang et al., 2018). Due to these features, ternary precision networks have garnered interest for their hardware realizations (Jain et al., 2020; Thirumala et al., 2020). Ternary DNNs can be implemented using classical accelerator architectures (e.g., Tensor Processing Unit and Graphical Processing Unit) by employing specialized processing elements and on-chip scratchpads to improve energy efficiency, but are nevertheless limited by memory bottleneck. In this regard, computing-in-memory (CiM) brings a new opportunity that can greatly enhance efficiency of DNN accelerators by reducing power-hungry data transfer between memory and processors.