Integration of nanoscale memristor synapses in neuromorphic computing architectures

This article has been downloaded from IOPscience. Please scroll down to see the full text article.
2013 Nanotechnology 24 384010
(http://iopscience.iop.org/0957-4484/24/38/384010)

View the table of contents for this issue, or go to the journal homepage for more

Download details:
IP Address: 129.132.202.35
The article was downloaded on 03/09/2013 at 13:09

Please note that terms and conditions apply.
Integration of nanoscale memristor synapses in neuromorphic computing architectures

Giacomo Indiveri1, Bernabé Linares-Barranco2, Robert Legenstein3, George Deligeorgis4 and Themistoklis Prodromakis5,6

1 Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
2 Instituto de Microelectrónica de Sevilla, IMSE-CNM, CSIC and University of Sevilla, Sevilla, Spain
3 Institute for Theoretical Computer Science, Graz University of Technology, Graz, Austria
4 CNRS-LAAS and Université de Toulouse, 7 avenue du colonel Roche, F-31400 Toulouse, France
5 Center for Bio-Inspired Technology, Department of Electrical and Electronic Engineering, Imperial College London, UK
6 Nano Research Group, School of Electronics and Computer Science, University of Southampton, UK

E-mail: giacomo@ethz.ch

Received 4 January 2013, in final form 6 March 2013 Published 2 September 2013 Online at stacks.iop.org/Nano/24/384010

Abstract

Conventional neuro-computing architectures and artificial neural networks have often been developed with no or loose connections to neuroscience. As a consequence, they have largely ignored key features of biological neural processing systems, such as their extremely low-power consumption features or their ability to carry out robust and efficient computation using massively parallel arrays of limited precision, highly variable, and unreliable components. Recent developments in nano-technologies are making available extremely compact and low power, but also variable and unreliable solid-state devices that can potentially extend the offerings of availing CMOS technologies. In particular, memristors are regarded as a promising solution for modeling key features of biological synapses due to their nanoscale dimensions, their capacity to store multiple bits of information per element and the low energy required to write distinct states. In this paper, we first review the neuro- and neuromorphic computing approaches that can best exploit the properties of memristor and scale devices, and then propose a novel hybrid memristor-CMOS neuromorphic circuit which represents a radical departure from conventional neuro-computing approaches, as it uses memristors to directly emulate the biophysics and temporal dynamics of real synapses. We point out the differences between the use of memristors in conventional neuro-computing architectures and the hybrid memristor-CMOS circuit proposed, and argue how this circuit represents an ideal building block for implementing brain-inspired probabilistic computing paradigms that are robust to variability and fault tolerant by design.

(Some figures may appear in colour only in the online journal)

1. Introduction

The idea of linking the type of information processing that takes place in the brain with theories of computation and computer science (something commonly referred to as neuro-computing) dates back to the origins of computer science itself [1, 2]. Neuro-computing has been very popular in the past [3, 4], eventually leading to the development of abstract artificial neural networks implemented on digital computers, useful for solving a wide variety of practical problems [5–9]. However, the field of neuromorphic engineering is a much younger one [10]. This field has been mainly concerned
with hardware implementations of neural processing and sensory-motor systems built using very large scale integration (VLSI) electronic circuits that exploit the physics of silicon to reproduce directly the biophysical processes that underlie neural computation in real neural systems. Originally, the term ‘neuromorphic’ (coined by Carver Mead in 1990 [11]) was used to describe systems comprising analog integrated circuits, fabricated using standard complementary metal oxide semiconductor (CMOS) processes. In recent times, however, the use of this term has been extended to refer to hybrid analog/digital electronic systems, built using different types of technologies.

Indeed, both artificial neural networks and neuromorphic computing architectures are now receiving renewed attention thanks to progress in information and communication technologies (ICTs) and to the advent of new promising nanotechnologies. Some present day neuro-computing approaches attempt to model the fine details of neural computation using standard technologies. For example, the Blue Brain project, launched in 2005, makes use of a 126 kW Blue Gene/P IBM supercomputer to run software that simulates with great biological accuracy the operations of neurons and synapses of a rat neocortical column [12]. Similarly, the BrainScaleS EU-FET FP7 project aims to develop a custom neural supercomputer by integrating standard CMOS analog and digital VLSI circuits on full silicon wafers to implement about 262 thousand integrate-and-fire (I&F) neurons and 67 million synapses [13]. Although configurable, the neuron and synapse models are hardwired in the silicon wafers, and the hardware operates about 10,000 times faster than real biology, with each wafer consuming about 1 kW power, excluding all external components. Another large-scale neuro-computing project based on conventional technology is the SpiNNaker project [14]. The SpiNNaker is a distributed computer, which interconnects conventional multiple integer precision multi ARM core chips via a custom communication framework. Each SpiNNaker package contains a chip with 18 ARM9 Central Processing Units (CPUs) on it, and a memory chip of 128 MB synchronous dynamic random access memory (DRAM). Each CPU can simulate different neuron and synapse models. If endowed with simple synapse models, a single SpiNNaker device ARM core can simulate the activity of about 1000 neurons in real time. More complex synapse models (e.g. with learning mechanisms) would use up more resources and decrease the number of neurons that could be simulated in real time. The latest SpiNNaker board contains 47 of these packages, and the aim is to assemble 1200 of these boards. A full SpiNNaker system of this size would consume about 90 kW. The implementation of custom large-scale spiking neural network hardware simulation engines is being investigated also by industrial research groups. For example, the IBM group led by D S Modha recently proposed a digital ‘neurosynaptic core’ chip integrated using a standard 45 nm silicon on insulator (SOI) process [15]. The chip comprises 256 digital I&F neurons, with 1024 × 256 binary valued synapses, configured via a static random access memory (SRAM) cross-bar array, and uses an asynchronous event-driven design to route spikes from neurons to synapses. The goal is to eventually integrate many of these cores onto a single chip, and to assemble many multi-core chips together, to simulate networks of simplified spiking neurons with human-brain dimensions (i.e. approximately 10\(^{10}\) neurons and 10\(^{14}\) synapses) in real time. In the mean time, IBM simulated 2.084 billion neurosynaptic cores containing 53 \(× 10^{10}\) neurons and 1.37 \(× 10^{14}\) synapses in software on the Lawrence Livermore National Lab Sequoia supercomputer (96 Blue Gene/Q racks), running 1542\(×\) slower than real time [16], and dissipating 7.9 MW. A diametrically opposite approach is represented by the Neurogrid system [17]. This system comprises an array of sixteen 12 \(×\) 14 mm\(^2\) chips, each integrating mixed analog neuromorphic neuron and synapse circuits with digital asynchronous event routing logic. The chips are assembled on a 16.5 \(×\) 19 cm\(^2\) Printed Circuit Board (PCB), and the whole system can model over one million neurons connected by billions of synapses in real time, and using only about 3 W of power [18]. As opposed to the neuro-computing approaches that are mainly concerned with fast and large simulations of spiking neural networks, the Neurogrid has been designed following the original neuromorphic approach, exploiting the characteristics of CMOS VLSI technology to directly emulate the biophysics and the connectivity of cortical circuits. In particular, the Neurogrid network topology is structured by the data and results obtained from neuro-anatomical studies of the mammalian cortex. While offering less flexibility in terms of connectivity patterns and types of synapse/neuron models that can be implemented, the Neurogrid is much more compact and dissipates orders of magnitude less power than the other neuro-computing approaches described above. All these approaches have in common the goal of attempting to simulate large numbers of neurons, or as in the case of Neurogrid, to physically emulate them with fine detail.

Irrespective of the approach followed, nanoscale synapse technologies and devices have the potential to greatly improve circuit integration densities and to substantially reduce power dissipation in these systems. Indeed, recent trends in nanoelectronics have been investigating emerging low-power nanoscale devices for extending standard CMOS technologies beyond the current state-of-art [19]. In particular, resistive random access memory (ReRAM) is regarded as a promising technology for establishing next-generation non-volatile memory cells [20], due to their infinitesimal dimensions, their capacity to store multiple bits of information per element and the minuscule energy required to write distinct states. The factors driving this growth are attributed to the devices’ simple (two terminals) and infinitesimal structural (state-of-art is down to 10 \(×\) 10 nm\(^2\) [21]) and ultra-low-power consumption (<50 fJ/bit) that so far are unmatched by conventional VLSI circuits.

Various proposals have already been made for leveraging basic nanoscale ReRAM attributes in reconfigurable architectures [22], neuro-computing [23] and even artificial synapses [24–28]. However the greatest potential of these nanoscale devices lies in the wide range of interesting physical properties they possess. Neuromorphic systems can harness the interesting physics being discovered in these
Figure 1. (a) Cross section of a chemical synapse, illustrating the discharge of neurotransmitters within a synaptic cleft originating from a pre-synaptic neuron. (b) Schematic representation of solid-state memristors where ionic species can be displaced within a device’s insulating medium, transcribing distinct resistive states, by application of electrical stimuli on the top or bottom electrodes of the device.

In this paper we first describe how nanoscale synaptic devices can be integrated into neuro-computing architectures to build large-scale neural networks, and then propose a new hybrid memristor-CMOS neuromorphic circuit that emulates the behavior of real synapses, including their temporal dynamics aspects, for exploring and understanding the principles of neural computation and eventually building brain-inspired computing systems.

2. Solid-state memristors

ReRAM cells are nowadays classified as being memory resistors [29], or memristors for short, that have first been conceptually conceived in 1971 by Leon Chua [30]; with the first biomimetic applications presented at the same time. The functional signature of memristors is a pinched hysteresis loop in the current–voltage ($I-V$) domain when excited by a bipolar periodic stimulus [31]. Such hysteresis is typically noticed for all kinds of devices/materials in support of a discharge phenomenon that possess certain inertia, causing the value of a physical property to lag behind changes in the mechanism causing it, and has been common both to large scale [32] as well as nanoscale dissipative devices [33].

2.1. Emerging nanodevices as synapse mimetics

The analogy of memristors and chemical synapses is made on the basis that synaptic dynamics depend upon the discharge of neurotransmitters within a synaptic cleft (see figure 1(a)), in a similar fashion that ‘ionic species’ can be displaced within any inorganic barrier (see figure 1(b)). For TiO$_2$-based memristor models [33, 34] hypothesized that solid-state devices comprise a mixture of TiO$_2$ phases, a stoichiometric and a reduced one (TiO$_{2-x}$), that can facilitate distinct resistive states via controlling the displacement of oxygen vacancies and thus the extent of the two phases. More recently however it was demonstrated that substantial resistive switching is only viable through the formation and annihilation of continuous conductive percolation channels [35] that extend across the whole active region of a device, shorting the top and bottom electrodes; no matter what the underlying physical mechanism is.

An example of $I-V$ characteristics of TiO$_2$-based memristors is depicted in figure 2(a). In this example, consecutive positive voltage sweeps cause any of the cross-bar type devices [36] shown in the inset of figure 2(a) to switch from a high-resistive state (HRS) to low-resistive states (LRSs). When the polarity of the voltage sweeps is however inverted, the opposite trend occurs, i.e. the device toggles from LRS to HRS consecutively (as indicated by the corresponding arrows). These measured results are consistent with analogous ones proposed by other research groups [37–39] and demonstrate the devices’ capacity for storing a multitude of resistive states per unit cell, with the programming depending on the biasing history. This is further demonstrated in figure 2(b), by applying individual pulses of $-3$ V in amplitude and 1 $\mu$s long for programming a single memristor at distinct non-volatile resistive states. In this scenario, the solid-state memristor emulates the behavior of a depressing synapse [40, 41]; the inverse, i.e. short-term potentiation is also achievable by alternating the polarity of the employed pulsing scheme.

The development of nanoscale dynamic computation elements may notably benefit the establishment of neuromorphic architectures. This technology adds substantially to computation functionality, due to the rate dependency of the underlying physical switching mechanisms. At the same time it can facilitate unprecedented complexity due to the capacity of storing and processing spiking events locally. Moreover, exploiting the nanoscale dimensions and architecture simplicity of solid-state memristor implementations could substantially augment the number of cells per
Figure 2. Characterization of a TiO$_2$-based solid-state memristor. (a) $I$–$V$ characteristics for consecutive voltage sweeping. Positive (negative) biasing renders an increase (decrease) in the device’s conductance. Inset of (a) depicts a $25 \times 25$ array cross-bar type memristors comprising of TiO$_2$ active areas of $1 \times 1 \, \mu m^2$. These cells can be programmed at distinct resistive states as shown in (b) by employing $-3 \, V$ and $1 \, \mu s$ wide pulses, while evaluation of the device’s states is performed at $0.9 \, V$.

Figure 3. SEM micrograph of a large nanosized memristor array. Top inset shows a zoom-in of the top left corner where the individual devices are distinguished. Bottom left inset shows an (AFM) image of a small part of the array. Individual devices are addressed by placing a conductive (AFM) tip on the top electrode.

unit area, effectively enhancing the systems’ redundancy for tolerating issues that could stem from device mismatch and low yields [42].

2.2. Memristor scaling

Resistive memory scaling has been intensively investigated for realization of nanosized ReRAM [43]. In principle memristors may be scaled aggressively well below conventional RAM cells due to their simplicity: fabrication-wise memristors typically rely on a metal insulator metal (MIM) structure. The memristor action occurs in the insulating material. Scaling down the thickness of such a material will reduce both the required set voltage as well as the read voltage used during operation. In this context, thickness figures of a few nanometers have been demonstrated and operating voltages below $1 \, V$ have been shown [44] with a switching energy of a few fJ [45]. Furthermore, reducing the active device area by down-scaling the electrodes leads to current scaling, as well as increased device density. Both of these effects are favorable for high complexity circuits.

Currently even though single memristor devices as small as $10 \, nm \times 10 \, nm$ have been demonstrated [21], cross-bar arrays are the most commonly used architecture [46, 36] to organize large numbers of individually addressable memristive synapses in a reduced space. In figure 3 we show
a large array of nanoscale memristors that we fabricated using electron beam lithography. This array consists of a continuous Pt bottom electrode and an active layer deposited by Sputtering. Subsequently, several arrays of nano-memristors with a size ranging from 20 to 50 nm were defined using E-beam lithography on PMMA and lift-off of the top Platinum electrode. The array shown here comprises $256 \times 256$ devices with a periodicity of 200 nm. To access each individual device a conductive atomic force microscope (AFM) tip was used. Such a structure has been used to study the variability of the fabricated devices. Using E-beam lithography for both the top and bottom electrodes a fully interconnected cross-bar structure with similar size and pitch may be fabricated.

3. Memristor-based neuro-computing architectures

Memristive devices have been proposed as analogs of biological synapses. Indeed, memristors could implement very compact but abstract models of synapses, for example representing a binary ‘potentiated’ or ‘depressed’ state, or storing an analog ‘synaptic weight’ value [47]. In this framework, they could be integrated in large and dense cross-bar arrays [48] to connect large numbers of silicon neurons [49], and used in a way to implement spike-based learning mechanisms that change their local conductance.

In [25, 24] the authors proposed a scheme where neurons can drive memristive synapses to implement a spike-timing dependent plasticity (STDP) [50] learning scheme by generating single pairs of pre- and post-synaptic spikes in a fully asynchronous manner, without any need for global or local synchronization, thus solving some of the problems that existed with previously proposed learning schemes [51, 28]. The main idea is the following: when no spike is generated, each neuron maintains a constant reference voltage at both its input and output terminals. During spike generation, each neuron forces a pre-shaped voltage waveform at both its input and output terminals, as shown in figure 4(a), to update the synaptic weight value stored in the memristor state. Since memristors change their resistance when the voltages at their terminals exceed some defined thresholds, it is possible to obtain arbitrary STDP weight update functions, including biologically plausible ones, as the one shown in figure 4(b) [50]. Moreover by properly shaping the spike waveforms of both pre- and post-synaptic spikes it is possible to change the form of the STDP learning function, or to
even make it evolve in time as learning progresses \cite{52,25}. Fully interconnected or partially interconnected synaptic cross-bar arrays, as illustrated in figure 4(c), could facilitate hierarchical learning neural network architectures. Since there is no need for global synchronization, this approach could be extended to multi-chip architectures that transmit spikes across chip boundaries using fully asynchronous timing. For example, a common asynchronous communication protocol that has been used in neuromorphic systems is based on the address event representation (AER) \cite{53,54}. In this representation, each spiking neuron is assigned an address, and when the neuron fires an address event is put on a digital bus, at the time that the spike is emitted. In this way time represents itself, and information is encoded in real time, in the inter-spike intervals. By further exploiting hybrid CMOS/memristor chip fabrication techniques \cite{55}, this approach could be easily scaled up to arbitrarily large networks (e.g., see figure 4(d)). Following this approach each neuron processor would be placed in a 2D-grid fully, or partially interconnected through memristors. Each neuron would perform incoming spike aggregation, provide the desired pre- and post-synaptic (programmable) spike waveforms, and communicate incoming and outgoing spikes through AER communication circuitry. Using state-of-the-art CMOS technology, it is quite realistic to provide in the order of a million such neurons per chip with about $10^4$ synapses per neuron. For example, by using present day 40 nm CMOS technology it is quite realistic to fit a neuron within a 10 $\mu$m $\times$ 10 $\mu$m area. This way, a chip of about 1 cm$^2$ could host of the order of one million neurons. At the same time, for the nanowire fabric deposited on top of CMOS structures, present day technology can easily provide nanowires of 100 nm pitch \cite{21}. This would allow to integrate about $10^4$ synapses on top of the area occupied by each CMOS neuron. Similarly, at the PCB level, it is possible to envisage that a 100-chip PCB could host about $10^8$ neurons, and 40 of these PCBs would emulate 4 billion neurons. In these large-scale systems the bottleneck is largely given by the spike or event communication limits. To cope with these limits such chips would inter-communicate through nearest neighbors, exploiting 2D-grid network-on-chip (NoC) and network-on-board (NoB) principles. For example, in \cite{56} the authors proposed a very efficient multi-chip inter-communication scheme that distributes event traffic over a 2D mesh network locally within each board through inter-chip high speed buses. Reconfigurability and flexibility would be ensured by defining the system architecture and topology through in-chip routing tables. Additionally, by arranging the neurons within each chip in a local 2D mesh with in-chip inter-layer event communication, it is possible to keep most of the event traffic inside the chips. At the board level, the 2D mesh scheme would allow for a total inter-chip traffic in the order of $E_v = 4N_{ch} \times E_{pp}$, where $N_{ch} = 100$ is the number of chips per board, $E_{pp}$ is the maximum event bandwidth per inter-chip bus (which we may assume to be around 100 Meps—mega events per second), and 4 reflects the fact that each chip is connected to its four nearest neighbors \cite{56}. With these numbers, the maximum traffic per board would be in the order of $E_v \approx 4 \times 10^{10}$ eps, which is about 400 eps per neuron just for inter-chip event exchange. In practice, inter-board traffic could be much sparser, if the system is partitioned efficiently. Such numbers are quite realistic for present day CMOS technology, and the approach is scalable. Regarding power consumption of the communication overhead, we can use as reference some recent developments for event-based fully bit-serial inter-chip transmission schemes over differential microstrips \cite{57,56}, where consumption is proportional to communication event rate. Each link would consume in the order of 40 mA at 10 Meps rate (this includes driver and receiver pad circuits \cite{57} as well as serializers and deserializers \cite{58}). If each neuron fires at an average rate of 1 Hz, and if each chip has 1 million neurons, the current consumption of the communication overhead would be about 4 mA per chip. If voltage supply is in the 1–2 V range, this translates into 4–8 mW per chip. For a 100 chip PCB the inter-chip communication overhead power consumption would thus be about 400–800 mW, for 1 Hz average neuron firing rate.

4. Neuromorphic and hybrid memristor-CMOS synapse circuits

We’ve shown how memristive devices and nano-technologies can be exploited to dramatically increase integration density and implement large-scale abstract neural networks. However to faithfully reproduce the function of real synapses, including their temporal dynamic properties, passive memristive devices would need to be interfaced to biophysically realistic CMOS circuits that follow the neuromorphic approach, as described in \cite{10,11}. On one hand, building physical implementations of circuits and materials that directly emulate the biophysics of real synapses and reproduce their detailed real-time dynamics are important for basic research in neuroscience, on the other, this neuromorphic approach can pave the way for creating an alternative non-von Neumann computing technology, based on massively parallel arrays of slow, unreliable, and highly variable, but also compact and extremely low-power solid-state components for building neuromorphic systems that can process sensory signals and interact with the user and the environment in real time, and possibly carry out computation using the same principles used by the brain. Within this context, of massively parallel artificial neural processing elements, memory and computation are co-localized. Typically the amount of memory available per each ‘computing node’ (synapse in our case) is limited and it is not possible to transfer and store partial results of a computation in large memory banks outside the processing array. Therefore, in order to efficiently process real-world biologically relevant sensory signals these types of neuromorphic systems must use circuits that have biologically plausible time constants (i.e., of the order of tens of milliseconds). In this way, in addition to being well matched to the signals they process, these systems will also be inherently synchronized with the real-world events they process and will be able to interact with the environment they operate in. But these types of
time constants require very large capacitance and resistance values. For example, in order to obtain an equivalent RC time constant of 10 ms with a resistor even as large as 10 MΩ, it would be necessary to use a capacitor of 100 pF. In standard CMOS VLSI technology a synapse circuit with this RC element would require a prohibitively large area, and the advantages of large-scale integration would vanish. One elegant solution to this problem is to use current-mode design techniques [59] and log-domain subthreshold circuits [60, 61]. When metal-oxide semiconductor field effect transistors (MOSFETs) are operated in the subthreshold domain, the main mechanism of carrier transport is that of diffusion [60], the same physical process that governs the flow of ions through proteic channels across neuron membranes. As a consequence, MOSFETs have an exponential relationship between gate-to-source voltage and drain current, and produce currents that range from femto- to nano-Ampères. In this domain it is possible to implement active VLSI analog filter circuits that have biologically realistic time constants and that employ relatively small capacitors.

4.1. A CMOS neuromorphic synapse

An example of a compact circuit that can produce both linear dynamics with biologically plausible time constants as well as non-linear short-term plasticity effects analogous to those observed in real neurons and synapses is the differential pair integrator (DPI) circuit [62] shown in figure 5(a). It can be shown [63] that by exploiting the translinear principle [64] across the loop of gate-to-source voltages highlighted in the figure, the circuit produces an output current $I_{syn}$ with impulse response of the form:

$$\tau \frac{d}{dt} I_{syn} + I_{syn} = \frac{I_w I_{th}}{I_t}, \quad (1)$$

where $\tau \triangleq C U_T/\kappa I_t$ is the circuit time constant, $\kappa$ the subthreshold slope factor [60], and $U_T = KT/q$ represents the thermal voltage. The currents $I_w$ and $I_{th}$ represent local synaptic weight and a global synaptic scaling gain terms, useful for implementing spike-based and homeostatic plasticity mechanisms [65, 66]. Therefore, by setting for example, $I_t = 5$ pA, and assuming that $U_T = 25$ mV at room temperature, the capacitance required to implement a time constant of 10 ms would be approximately $C = 1$ pF. This can be implemented in a compact layout and allows the integration of large numbers of silicon synapses with realistic dynamics on a small VLSI chip. The same circuit of figure 5(a) can be used to implement elaborate models of spiking neurons, such as the ‘Adaptive Exponential’ (AdExp) I&F model [67, 49]. Small (minimum size, of about 10 $\mu$m$^2$) prototype VLSI chips comprising of the order of thousands of neurons and synapses based on the DPI circuit have been already fabricated using a conservative 350 nm CMOS technology [68]. The data of figure 5(b) shows the average response of a DPI synapse circuits measured from one of such chips [68]. The data represents the average excitatory post synaptic potential (EPSP) produced by 124 neurons in response to a single spike sent to the DPI synapses of each neuron. The shaded areas, representing the standard deviation, highlight the extent of variability present in these types of networks, due to device mismatch. The main role of the DPI circuit of figure 5(a) is to implement synaptic dynamics. Short-term plasticity, STDP learning, and homeostatic adaptation mechanisms can be, and have been, implemented by interfacing additional CMOS circuits to control the DPI $V_{w}$ bias voltage, or to the $I_{th}$ bias current [62, 69, 70]. Long-term storage of the $V_{w}$ weights however requires additional power-consuming and area-expensive circuit solutions, such as floating gate circuits, or local analog to digital converter (ADC) and SRAM cells.

4.2. A new hybrid memristor-CMOS neuromorphic synapse

Nanoelectronic technologies offer a promising alternative solution for compact and low-power long-term storage of
synaptic weights. The hybrid memristor-CMOS neuromorphic synapse circuit we propose here, shown in figure 6(a), exploits these features to obtain at the same time dense integration of low-power long-term synaptic weight storage elements, and to emulate detailed synaptic biophysics for implementing relevant computational properties of neural systems.

The circuit depicted in figure 6(a) represents a possible implementation of a dense array of N synapses with independent weights but with the same, shared, temporal dynamics. Depending on their size, each memristor in figure 6(a) could represent a full synaptic contact, or an individual ion channel in the synaptic cleft (see also figure 1(a)). If the currently accepted model of filament formation in memristive devices is true, then downscaled memristors should approach single filament bi-stable operation. While this is a severe limitation for classical neural network applications in which memristors are required to store analog synaptic weight values with some precision, it would actually provide a very compact physical medium for emulating the stochastic nature of the opening and closing of ion channels in biological synapses.

The shared temporal dynamics are implemented by the DPI circuit in the top part of figure 6(a). Indeed, if this circuit is operated in its linear regime, it is possible to time multiplex the contributions from all spiking inputs, thus requiring one single integrating element and saving precious silicon real estate. The $V_w$ bias voltage of this circuit is a global parameter that sets the maximum possible current that can be produced by each memristor upon the arrival of an input spike, while the memristor conductance modulates the current being produced by the synapse very much like conductance changes in real synapses affect the excitatory post synaptic currents (EPSCs) they produce. Larger memristor conductances, which represent a larger number of open proteic channels in real synapses, correspond to larger synaptic weights.

Figure 6(b) shows the results of SPICE simulations of the circuit in figure 6(a), for a 180 nm CMOS process. The $I_{thr}$ and $I_c$ current sources were implemented with p-type MOSFETs, biased to produce 2 pA and 10 pA respectively, and the $V_w$ voltage bias was set to 700 mV. The data was obtained by simulating the response of one input memristive branch to a single input spike, while sweeping the memristor impedance from 1 to 7 kΩ. In these simulations we set the memristor in its LRS, and assumed we could modulate the value of the resistance to obtain four distinct analog states analogous to the ones measured experimentally in figure 2(b). Of course the circuit supports also the operation of the memristor as a binary device, working in either the HRS state or the LRS one. This bi-stable mode of using the memristor would encode only an ‘on’ or ‘off’ synaptic state, but it would be more reliable and it is compatible with biologically plausible learning mechanisms, such as those proposed in [71], and implemented in [69]. The circuit of figure 6(a) shows only the circuit elements required for a ‘read’ operation, i.e., an operation that stimulates the synapse to generate an EPSC with an amplitude set by the conductance of the memristor. Additional circuit elements would be required to change the value of the memristor’s conductance, e.g., via learning protocols. However the complex circuitry controlling the learning mechanisms would be implemented at the Input/Output (I/O) periphery of the synaptic array, for example with pulse-shaping circuits and architectures analogous to the ones described in section 3, or with circuits that check the state of the neuron and of it’s recent spiking history, such as those proposed in [61], and only a few additional compact elements would be required in each synapse to implement the weight update mechanisms.

5. Brain-inspired probabilistic computation

While memristors offer a compact and attractive solution for long-term storage of synaptic state, as done for example in
figure 6, they are affected by a high degree of variability (e.g., much higher than the one measured for CMOS synapses in figure 5(b)). In addition, as memristors are scaled down, unreliable and stochastic behavior becomes unavoidable. The variability, stochasticity, and general reliability issues that are starting to represent serious limiting factors for advanced computing technologies, do not seem to affect biological computing systems. Indeed, the brain is a highly stochastic system that operates using noisy and unreliable nanoscale elements. Rather than attempting to minimize the effect of variability in nano-technologies, one alternative strategy, compatible with the neuromorphic approach, is to embrace variability and stochasticity and exploit these ‘features’ to carry out robust brain-inspired probabilistic computation [99, 100].

The fact that the brain can efficiently cope with a high degree of variability is evident at many levels: at the macroscopic level trial-to-trial variability is present for example in the arm trajectories of reaching movement tasks. It is interesting to note that the variability of the end position of the reaching movement is reduced, if the task requires to hit or touch a target with high accuracy [72]. Variability is evident at the level of cortical neurons: there is significant trial-to-trial variability in their responses to identical stimuli: it is evident also at the level of chemical synapses, where there is a high degree of stochasticity in the transmission of neurotransmitter molecules [73], from the pre-synaptic terminal to the post-synaptic one. The release probability of cortical synapses ranges from values of less than 1%–100% [74]. This indicates that stochastic synaptic release may not merely be an unpleasant constraint of the molecular machinery but may rather be an important computational feature of cortical synapses.

The computational benefit of using hardware affected by variability and stochasticity in biological and artificial computing systems? Recent advances in cognitive science demonstrated that human behavior can be described much better in the framework of probabilistic inference rather than in the framework of traditional ‘hard’ logic inference [75], and encouraged the view that neuronal networks might directly implement a process of probabilistic inference [76]. In parallel, to this paradigm shift, research in machine learning has revealed that probabilistic inference is often much more appropriate for solving real-world problems, then hard logic [77]. The reason for this is that reasoning can seldom be based on full and exact knowledge in real-world situations. For example, the sensory data that a robot receives is often noisy and incomplete such that the current state of the environment can only partially be described. Probabilistic reasoning is a powerful tool to deal with such uncertain situations. Of course, exact probabilistic inference is still computationally intractable in general, but a number of approximation schemes have been developed that work well in practice.

In probabilistic inference, the idea is to infer a set of unobserved variables (e.g., motor outputs, classification results, etc) given a set of observed variables (evidence, e.g., sensory inputs), using known or learned probabilistic relationships among them. Specifically, if the distribution \( P(\bar{x}) \) describes the probabilistic relationships between the random variables \( x_1, \ldots, x_n \), and if \( x_1, \ldots, x_k \) of this distribution are observed, then one can infer a set of variables of interests \( x_{k+1}, \ldots, x_{k+l} \) by determining the posterior probability \( P(x_{k+1}, \ldots, x_{k+l}|x_1, \ldots, x_k) \). One of the most popular techniques used to perform inference is belief propagation [77]. While this message passing algorithm can be implemented by networks of spiking neurons [78], a more promising alternative approach, also well suited to model brain-inspired computation, is to use sampling techniques [79]. Probably the most important family of sampling techniques in this context is Markov-Chain Monte Carlo (MCMC) sampling. Since MCMC sampling techniques operate in a stochastic manner, stochastic computational elements are a crucial and essential feature. Recent studies have shown that probabilistic inference through MCMC sampling can be implemented by networks of stochastically spiking neurons [79, 80]. Therefore, MCMC sampling is a computational paradigm optimally suited for emulating probabilistic inference in the brain using neuromorphic circuits and nanoelectronic synapses.

Within this context, it is important to see if and how the distribution \( P(\bar{x}) \) can be learned from observations, i.e., how the artificial neural system can build its own model of the world based on its sensory input and then perform probabilistic inference on this model. For a relatively simple model [81], it has been shown that this can be accomplished by a local spike-driven learning rule that resembles the STDP mechanisms measured in cortical networks [50]. Analogous learning mechanisms have been demonstrated both experimentally in neuromorphic CMOS devices [69], and theoretically, with circuit models of memristive synapses [25].

With regard to learning, the variability and stochasticity ‘features’ described above can provide an additional benefit: for many learning tasks, humans and animals have to explore many different actions in order to be able to learn appropriate responses in a given situation. In these so-called reinforcement learning setups, noise and variability naturally provide the required exploration mechanisms. A number of recent studies have shown how stochastic neuronal behavior could be utilized by cortical circuits in order to learn complex tasks [82–84]. For example, reservoir computing (RC, also known under the terms Liquid State Machines and Echo State Networks) is a powerful general principle for computation and learning with complex dynamical systems such as recurrent networks of analog and spiking neurons [85, 86] or optoelectronic devices [87]. The main idea behind RC is to use a heterogeneous dynamical system (called the reservoir) as a non-linear fading memory where information about previous inputs can be extracted from the current state of the system. This reservoir can be quite arbitrary in terms of implementation and parameter setting as long as it operates in a suitable dynamic regime [88]. Readout elements are trained to extract task-relevant information from the reservoir. In this way, arbitrary fading memory filters or even arbitrary dynamical systems (in the case when the readout elements provide feedback to the dynamical system)
can be learned. One long-standing disadvantage of traditional RC was that readouts had to be trained in a supervised manner. In other words, a teacher signal was necessary that signals at each time point the desired output of readouts. In many real-world applications, such a teacher signal is not available. For example, if the task for a robot controller is to produce some motor trajectory in order to produce a desired hand movement, the exact motor commands that perform this movement are in general not known. What can be evaluated however is the quality of the movement. Recently, it has been demonstrated that noisy readouts can be trained with a much less informative reward signal, which just indicates whether some measure of performance of the system has recently increased [84]. Of course, such reward-based learning can in general be much slower than the pure supervised approach (see, e.g., [89]). The actual slowdown however depends on the task at hand, and it is interesting that for a set of relevant tasks, reward-based learning works surprisingly fast [84].

Since the functionality of reservoirs depends on its general dynamical behavior and not on precise implementation of its components, RC is an attractive computational paradigm for circuits comprised of nanoscale elements affected by variability, such as the one proposed in section 4.2. In fact, if the reservoir is composed by a large number of simple interacting dynamic elements—the typical scenario—then heterogeneity of these elements is an essential requirement for ideal performance. Parameter heterogeneity is also beneficial in so-called ensemble learning techniques [90]. It is well known that the combination of models with heterogeneous predictions for the same data-set tends to improve overall prediction performance [91]. Hence, heterogeneity of computational elements can be a real benefit for learning. Examples for ensemble methods are random forests [92], bagging [93], and boosting [94].

6. Discussion and conclusions

Memristors, and in particular nanoscale solid-state implementations, represent a promising technology, baring benefits for emerging memory storage as well as revisiting conventional analog circuits [95]. Given their low-power and small-scale characteristics, researchers are considering their application also in large-scale neural networks for neuro-computing applications. However, the fabrication of large-scale nanoscale cross-bar arrays involves several issues that are still open: the realization of nanosized electrodes requires nanopatterning [96] techniques, such as electron beam lithography (EBL) or nano-imprint lithography (NIL) [97]. This directly correlates to reduced electrode cross section which results in increasing resistance. As electrode resistance scales with length, this can rapidly become a critical issue for fully interconnected nanoscale cross-bar structures. Furthermore, down-scaling the electrode size to reduce the device active area requires simultaneous down-scaling of the thickness of the metallizations due to fabrication concerns. This in turn further increases the resistance of the electrodes, much like the interconnects in modern CMOS circuitry. These factors introduce a large offset in the write voltages required to change the state of ReRAMs cells that depends on the position of the cell in the array. This problem is especially critical in neuro-computing architectures where these cells represent synapses, as the offsets directly affect the weight update and learning mechanisms.

Integrating memristors as synapse elements in large-scale neuro-computing architectures also introduces the significance of process variability in memistor dimensions [98], which in turn introduces a significant amount of variability in the characteristics of the synapse properties. In addition to their large variability, another important issue relating to these types of synapses, that is still ignored in the vast majority of neuro-computing studies, is the effect of limited resolution in memristive states. In particular, it is not known what the trade-off between desired synaptic weight resolution and memristor size is. And it is not known to what extent the multi-step synaptic weight model holds true for aggressively down-scaled memristor sizes.

These scaling, integration, and variability issues are serious limiting factors for the use of memristors in conventional neuro-computing architectures. Nonetheless, biological neural systems are an existence proof that it is possible to implement robust computation using nanoscale unreliable components and non-von Neumann computing architectures. In order to best exploit these emerging nanoscale technologies for building compact, low-power, and robust artificial neural processing systems it is important to understand the (probabilistic) neural and cortical principles of computation and to develop at the same time, following a co-design approach, the neuromorphic hardware computing substrates that support them. In this paper we elaborated on this neuromorphic approach, presenting an example of a neuromorphic circuit and of a hybrid nanoelectronic-CMOS architecture that directly emulate the properties of real synapses to reproduce biophysically realistic response properties, thus providing the necessary technology for implementing massively parallel models of brain-inspired computation that are, by design, probabilistic, robust to variability, and fault tolerant.

Acknowledgment

This work was supported by the European CHIST-ERA program, via the ‘Plasticity in NEUral Memristive Architectures’ (PNEUMA) project.

References

[26] Jo S H, Chang T, Ebbing E, Bhadviya B B, Mazumder P and
Lu W 2010 Nanoscale memristor device as synapse in
neuromorphic systems Nano Lett. 10 1297–301
[27] Choi H, Jung H, Lee J, Yoon J, Park J, Seong D-J, Lee W,
Hasan M, Jung G-Y and Hwang H 2009 An electrically
modifiable synapse array of CMOS/RRAM crossbars
Nano Lett. 9 3086–90
Nanoelectronic programmable synapses based on phase
change materials for brain-inspired computing Nano Lett.
12 2179–86
[29] Chua L 2011 Resistance switching memories are memristors
Appl. Phys. A 102 765–83
Trans. Circuit Theory 18 907–19
elements with memory: memristors, memcapacitors, and
meminductors Proc. IEEE 97 1717–24
[32] Prodamakis T, Tournouz C and Chua L 2012 Two centuries
of memristors Nature Mater. 11 478–81
[33] Strukov D B, Snider G S, Stewart D R and
Williams R S 2009 The missing memristor found Nature
458 1154–7
[34] Prodamakis T, Peh B P, Papavassiliou C and Tournouz C
2011 A versatile memristor model with nonlinear dopant
[35] Shihong M W, Prodamakis T, Salaoru I and Tournouz C
2012 Modelling of current percolation channels in
emerging resistive switching elements
arXiv:2746v1 [cond-mat.mes-hall]
Hussain T, Srivinasa N and Lu W 2011 A functional
hybrid memristor crossbar-array/CMOS system for data
storage and neuromorphic applications Nano Lett.
12 389–95
Multilevel resistive switching with ionic and metallic
filaments Appl. Phys. Lett. 94 233106
[38] Yoon K J, Lee M H, Kim G H, Song S J, Seok J Y, Han S,
Yoon J H, Kim K M and Hwang C S 2012 Memristive
tri-stable resistive switching at ruptured conducting
filaments of a Pt/TiO2/Pt cell Nanotechnology 23 185202
power multiple-state rram devices Sci. Rep. 2 744
[40] O’Donovan M J and Rinzel J 1997 Synaptic depression: a
dynamic regulator of synaptic communication with varied
functional roles Trends Neurosci. 20 431–3
depression and the temporal response characteristics of V1
cells J. Neurosci. 18 4785–99
Biomimetic model of the outer plexiform layer by
incorporating memristive devices Phys. Rev. E 85 041918
Lin Y Y, Huang R, Zou Q T and Wu J G 2012 Linear
scaling of reset current down to 22-nm node for a novel
RRAM IEEE Electron Device Lett. 33 89–91
[44] Delevryelle D, Putero M, Ouled-Khachrom T, Bouquet M,
Coulet M-V, Boddart X, Calmes N and Muller C 2013
Ge2Sb2Te5 layer used as solid electrolyte in conductive-
bridge memory devices fabricated on flexible substrate
Solid-State Electron. 79 159–65
Ultra-low switching power RRAM using hopping
conduction mechanism Meeting Abstracts MA2012-02 pp 2574–4
[46] Lewis D L and Lee H-H S 2009 Architectural evaluation of
3D stacked RRAM caches 3DIC 2009: IEEE Int. Conf. on
3D System Integration (Sept. 2009) pp 1–4


[56] Indiveri G and Chicca E 2011 A VLSI neuromorphic device for implementing spike-based neural networks Neural Nets WIRN11—Proc. 21st Italian Workshop on Neural Nets pp 305–16


[58] Bartolozzi C and Indiveri G 2009 Global scaling of synaptic efficacy: homeostasis in silicon synapses Neurocomputing 72 726–31


[71] Legenstein R, Chase S M, Schwartz A B and Maass W 2010 A reward-modulated Hebbian learning rule can explain experimentally observed network reorganization in a brain control task J. Neurosci. 30 8400–10
[94] Schapire R E 1990 The strength of weak learnability Mach. Learn. 5 197–227
[94] Schapire R E 1990 The strength of weak learnability Mach. Learn. 5 197–227
[94] Schapire R E 1990 The strength of weak learnability Mach. Learn. 5 197–227
[94] Schapire R E 1990 The strength of weak learnability Mach. Learn. 5 197–227