# VLSI single neuron detailed parameters estimation

## Introduction

Complex VLSI hardware subthreshold implementations of spiking neurons with bio-phisically realistic dynamics can be conveniently modeled by non-linear dynamical systems of equations, which are essentially systems of coupled ordinary dierential equations (ODEs).

A straightforward way of estimating parameters from membrane potential time-profile measures is to make model outputs resemble the experimental data as closely as possible through residual error minimization. In the next section we show how we used Metropolis-Hastling sampling with simulated annealing for parameters estimation. We will also provide some c code examples for a twin experiment (available at http://capocaccia.ethz.ch/capo/attachment/wiki/2012/tuning12/).

### Fitting models to data

In this section we give a short introduction on the method used. We use Markov Chain Monte Carlo in order to achieve parameters estimation and evaluate the quality of the fit in respect a certain model. This can be useful for circuit design and will help us in understanding what parameters are "important" in our model and the quality of the model in respect to some circuit implementation of it.

In general what we can measure from our VLSI neuromorphic hardware is a time series of data ( tipically the membrane potential). It consists of some measurements Di(t) at discrete time points ti. We also have a proposed model of the circuit behaviour which produce a time series yi(tj) where represents a full set of parameters. This model in general can be a polynomial function or could be represented by as a set of ODEs. The goal is to nd a set of parameters that best t the data Di(t). One way of estimating parameters that has been widely used in many different fields is by using Markov Chain Monte Carlo. The first things to do is to dene an error function, one popular choice is the sum of

the square errors:

χ2=∑i(Di(t)-yi(t|θ))22σ2

σ is the estimate of the error of the data, often we can use the experimental uncertainly or in our case we could use the design parameters specifications.The answer will not depend much on the choice of σ.

If we assume that our data are spread in a gaussian fashion the likelihood function can be written as:

P(D|θ)=e-χ2

Therefore the likelihood is maximal when the error is minimal. Our problem is now defined as a maximization of the likelihood function. A very effective method for maximizing the Likelihood function and therefore minimizing the error is by using the Metropolis algorithm.

The pseudo code for it is the following:

- We guess all the parameters values and we calculate χ2
- Next we guess another set of parameters set and we calculate some other χp
- At this point we compute the likelihood ratio

ratio =P(D|θc)P(D|θp)=e-χp2+χc2

- ratio is a positive real number
- At this point Metropolis ask you to pick a random number r between (0,1) and if \emph{ratio} is bigger than $r$ you replace the current parameters set with the proposed set and then repeat from step 2.

An important notice here is that at each step of the process you store each parameters set. In some cases the values of the current set won't change but we will still keep the value.

Now thanks to Bayesian inference we could state that our best estimation for the set of parameters is the mean values of those posterior distributions and as error we could get the variance of the distributions.

### Case of study: Optimization with Metropolis-Hastings sampling and simulated annealing

This technique is based on sampling the parameter space, by simply assigning random values to all the parameters satisfying the initial constraints and sequentially estimate certain error function.

**Metropolis-Hastings algorithm**

Suppose that $\pi(\theta)$ is the density of interest. Suppose further that we have some $k(\theta, \Phi)$ proposal distribution which is easy to simulate from, but does not (necessarily) have $\pi(\theta)$ as its stationary density. Metropolis-Hastings algorithm is the following (pseudocode):

- Initialise the iteration counter to $j = 1$, and initialise the chain to $\theta(1)$
- Generate a proposed value of $\Phi$ by using the kernel $k(\theta^{j-1},\Phi)$
- Evaluate the acceptance probability $A(\theta(j-1), \Phi)$ of the proposed step:

\begin{equation}

A( \theta , \Phi ) = min \left( 1 , \frac{ \pi(\Phi) k(\Phi , \theta) }{ \pi(\theta) k(\theta , \Phi) } \right)

\end{equation}

\begin{itemize}

\item{ Go to $\theta(j) = \Phi$ with probability $A(\theta(j-1), \Phi)$ , or stay in $\theta(j) = \theta(j-1)$ otherwise.}

\end{itemize}

At each stage, a new value is generated from the proposal distribution (Note that the prior distribution can be flat - in such case is called \emph{non informative prior} - ). This generated value can be accepted, in which case the chain moves, or rejected, in this case the chain stays where it is. Whether or not the move is accepted or rejected depends on an acceptance probability $A$ which

itself depends on the relationship between the density of interest and the proposal distribution.

Note that the density of interest $\pi$ only enters into the acceptance probability as a ratio, and so

the method can be used when the density of interest is only known up to a scaling constant.

In the context of Bayesian inference one possible choice for the

proposal density is the prior density. The acceptance probability then becomes the ratio in between the likelihood of the candidate points and the current value $x$.

\begin{equation}

A( \theta , \Phi ) = min \left( 1 , \frac{ L(\Phi , x) }{ L(\theta , x) } \right)

\end{equation}

example

In our case we do an example of application of Metropolis-Hastlings algorithm with FitzhHugh-Nagumo neuron model which is described by the following equations:

\begin{eqnarray}

\frac{d{X}}{d{t}} = X-\left( \frac{X^{3}}{3} \right) - Y + I_{ext} \\

\frac{d{Y}}{d{t}} = c*\left( X+ a - b*Y \right)

\end{eqnarray}

In the Fig.1 I show the integration of those equations with selected $a=0.7, b=0.8, c=0.08, I_{ext} = 0.5$.

**Figure 1 - FitzhHugh-Nagumo neuron model ODE integration**

We start with Di(ti) measurements of the system in this case just 100 points of a time series Y (t). The model which describes our data is assumed to be (eq. 3; 4). The unknow parameters are i (a,b,c). Which are also subject to the following boundaries: 0<a<1 , 0<b<1, 0.001<c<0.2. As error function we can choose the sum of square errors:

\begin{equation}

\chi^2 = \sum_{i}\frac{\left( D_{i}(t_{i}) - y_{i}(t_{i} | \theta) \right)^{2} }{2 \sigma^{2}}

\end{equation}

in which $\sigma$ is some estimations of the error in the data (my guess is $\sigma=2.0$). And anyway the fit will not depend too much on $\sigma$. As a likelihood function we choose to use a Gaussian function $e^{-\chi^{2}}$. Therefore the likelihood is maximal when the error is minimal. So the parameters estimation can be cast into the problem of maximizing the likelihood, in other words: "what are the parameters that maximize my likelihood?".

We used the simple Metropolis algorithm to do this maximization. Basically we generate one set of parameter and we calculate $\chi_{current}^{2}$ for that particular set, then we generate another set of parameters which will give us the $\chi_{proposed}^{2}$.

So after this we compute the likelihood ratio (eq. $9$).

\begin{equation}

\frac{P(D|\theta_{proposed})}{P(D|\theta_{current})} = exp(-\chi_{proposed}^{2}+\chi^{2}_{current})

\end{equation}

Now the metropolis algorithm tell us to pick a number $n$ between $0$ and $1$ and if the likelihood ratio is bigger than $N$ we accept the new proposed set of parameters. Then we repeat.

When we do this we save every set parameters (i.e for each steps, in some cases the set will not change but we will still keep it). We let this run for a long number of iterations. There's also a star-up time during which we don't record parameters.

Thanks to Bayesian inference now we can get the best estimation of my parameters as the pick of the distribution obtained by recording all sets of parameters. Because the histogram of the parameters set is the posterior density. we can get the mode, median or mean as my best estimation. The error would be the standard deviation or the variance in the interval between 10/90 $\sigma$.

The following plots are done after $50000$ iterations, the \emph{burn-in} time was set to $1000$ iterations.

**Figure 2- Parameters Estimation. **

**Figure 3 - Parameter A **

**Figure 4 - Parameter C **

### VLSI Parameter Estimation using Particle Swarm Optimization

One problem we commonly face
when trying to map canonical microcircuits on our vlsi hardware is that
since we want to match the biological properties

of neuons we exploit
a special region of transistor operation which is called
"subthreshold". In this region however signals are very small and
therefore suscetible to noise and process fabrication variation. Another
problem is that chips biases, that are the voltages and currents that
we use to set the parameters on the chip, do not easily map abstract
model parameters types and values. For example the sinaptic efficacy in
our chip is a complex functions of multiple biases.

**Figure 5 - Canonical macrocircuits and VLSI neuromorphic chip ifmem**

We want to consider the precise spike timing of silicon neurons in order to be able to reproduce on hardware complex and richer dynamics in with precise spike timing is required, for example for during bursting behaviours. For this porpoise we introduce a coincidence factor gamma which incorporates the similarity in the firing rate and the overlap between spikes. The coincidence factor gamma was first introduced by kistler in the 97 and can be explained using the following simulation from [C. Rossant et at 2010].

**Figure 6 - Simulation of IF neurons with same mean firing rate frequency but different gamma factors.**

The red ball represent a vlsi neurons and the blue ball represents its mathematical description. If we inject a know input current to our vlsi circuit we will get a sequence of ndata spikes and if we solve the math for our model with the same amount of current we will also get a time series of spikes (nmodel). In Fig 6 [from C.Rossant 2010] we can see a simulation of two leaky integrate and fire neurons, in the plot they show the membrane potentialon the y axes and the time on the x axis.

Gamma is close to zero in the first plot and gamma is almost one in the second one, even if the mean firing rate frequency is always the same.

so now that we have all the ingredient the problem i am trying to solve is the following, given a set of parameters theta, i want to maximaze the

gamma function

in literature there are a couple of examples in which they use particle swarm optimization in order to maximize the gamma function and automatically fitting spiking neuron models to electrophysiological recordings or one could think about maximum likelihood.

Particle swarm optimization method can be used to solve the maximization problem of the Gamma Factor and it can deal very well with non linear problems with a large number of parameters.

The algorithm and its concept of PSO were introduced by James Kennedy and Russel Ebhard in the 95 the basic concept of the algorithm is to create a swarm of particles and exploit their swarm intelligence in order to find an optimal solution to a task.

The particle are completely defined by their position and velocity in the problem space. (every particle is a ful set of parameters) The fitness (gamma function) of each particle represents the quality of its position. The velocity it is influenced by its own best position found so far and the best solution found so far by its neighbours. Another important concept is that there each particle velocity that makes them explore unkwown problem space regions.

This property combined with a good initial distribution of the swarm enable an extensive exploration of the problem space and give a very high chance of finding the optimal best solution. In figure 7 we show only two of the 13th dimensions of the problem space, for shake of simplicity, one dimension is the threshold of the neuron vth and the other dimension is represented by a process parameters k.

**Figure 7 - Particle Swarm Evolution. **These plots show the evolution of the position of the particles in respect to 2 parameters. As you can see after 150 iterations the particle are clustered around a minima in the parameter space (the best parameter estimation).

So the result of the simulation is to get the best position ever generated by a particle in the swarm. this position represent my full set of parameters.

Now we can use those full set of parameters in order to fit the single neuron spike time when I will excite it with know input current.

**Figure 8 - Single neuron behaviour reproduced by the parameter estimated.** In Blue a measure of the membrane potential of a VLSI neuron of Ifmem chip, in green a simulated neuron with parameter estimated via PSO.