Backpropagation Algorithm

Now this is the concept my entire computational neuroscience study was essentially based on.

Backpropagation is a short form for “backward propagation of errors.” 
Basically, back propagation means that after each forward pass, the network performs a backward pass, where it adjusts the weights and biases – in order to minimise the cost function. 

The cost functions is what returns the error value between actual and predicted outputs. So back propagation essentially aims to reduce the error as much as possible. 

Back propagation finds use in calculating gradient descent and is an important process in training the neural net. 

Understanding the neural network model

For this purpose let’s consider a 4-layer neural network consists of 4 neurons for the input layer, 4 neurons for the hidden layers and 1 neuron for the output layer, as illustratrated in the image below.

Simple 4-layer neural network illustration (source)

Input layer

The neurons, colored in purple, represent the input data. These can be as simple as scalars or more complex like vectors or multidimensional matrices.

Equation for input x_i

The first set of activations (a) are equal to the input values. 

Hidden layers

The final values at the hidden neurons, colored in green, are computed using z^l — weighted inputs in layer l, and a^l— activations in layer l. For layer 2 and 3 the equations are:

  • l = 2
Equations for z² and a²
  • l = 3
Equations for z³ and a³

 and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers.

Activations  and  are computed using an activation function f. Usually, the activation function, f is non-linear, which allows the network to learn complex patterns in data.

Let’s choose layer 2 and its parameters as an example. The same operations can be applied to any layer in the network.

  •  is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the next layer) and m is the number of input neurons (neurons in the previous layer). For us, n = 2 and m = 4.
Equation for W¹

The first number in any weight’s subscript matches the index of the neuron in the next layer (in our case this is the Hidden_2 layer) and the second number matches the index of the neuron in previous layer (in our case this is the Input layer).

  • x is the input vector of shape (m, 1) where m is the number of input neurons. For us, m = 4.
Equation for x
  •  is a bias vector of shape (n , 1) where n is the number of neurons in the current layer. For us, n = 2.
Equation for b¹

Following the equation for z², we can use the above definitions of W¹, x andb¹ to derive “Equation for z²”:

Equation for z²

Now carefully observe the neural network illustration from above.

Input and Hidden_1 layers (source)

You will see that  can be expressed using (z_1)² and (z_2)² where (z_1)²and (z_2)² are the sums of the multiplication between every input x_i with the corresponding weight (W_ij)¹.

This leads to the same “Equation for z²” and proofs that the matrix representations for z², a², z³ and  are correct.

Output layer

The final part of a neural network is the output layer which produces the predicated value. In our simple example, it is presented as a single neuron, colored in blue and evaluated as follows:

Equation for output s

We use the matrix representation to simplify the equation. One can use the above techniques to understand the underlying logic. 

Please leave any questions or comments below, and I would love to answer them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: