Now this is the concept my entire computational neuroscience study was essentially based on.
Backpropagation is a short form for “backward propagation of errors.”
Basically, back propagation means that after each forward pass, the network performs a backward pass, where it adjusts the weights and biases – in order to minimise the cost function.
The cost functions is what returns the error value between actual and predicted outputs. So back propagation essentially aims to reduce the error as much as possible.
Back propagation finds use in calculating gradient descent and is an important process in training the neural net.
Understanding the neural network model
For this purpose let’s consider a 4-layer neural network consists of 4 neurons for the input layer, 4 neurons for the hidden layers and 1 neuron for the output layer, as illustratrated in the image below.
The neurons, colored in purple, represent the input data. These can be as simple as scalars or more complex like vectors or multidimensional matrices.
The first set of activations (a) are equal to the input values.
The final values at the hidden neurons, colored in green, are computed using z^l — weighted inputs in layer l, and a^l— activations in layer l. For layer 2 and 3 the equations are:
- l = 2
- l = 3
W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers.
Activations a² and a³ are computed using an activation function f. Usually, the activation function, f is non-linear, which allows the network to learn complex patterns in data.
Let’s choose layer 2 and its parameters as an example. The same operations can be applied to any layer in the network.
- W¹ is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the next layer) and m is the number of input neurons (neurons in the previous layer). For us, n = 2 and m = 4.
The first number in any weight’s subscript matches the index of the neuron in the next layer (in our case this is the Hidden_2 layer) and the second number matches the index of the neuron in previous layer (in our case this is the Input layer).
- x is the input vector of shape (m, 1) where m is the number of input neurons. For us, m = 4.
- b¹ is a bias vector of shape (n , 1) where n is the number of neurons in the current layer. For us, n = 2.
Following the equation for z², we can use the above definitions of W¹, x andb¹ to derive “Equation for z²”:
Now carefully observe the neural network illustration from above.
You will see that z² can be expressed using (z_1)² and (z_2)² where (z_1)²and (z_2)² are the sums of the multiplication between every input x_i with the corresponding weight (W_ij)¹.
This leads to the same “Equation for z²” and proofs that the matrix representations for z², a², z³ and a³ are correct.
The final part of a neural network is the output layer which produces the predicated value. In our simple example, it is presented as a single neuron, colored in blue and evaluated as follows:
We use the matrix representation to simplify the equation. One can use the above techniques to understand the underlying logic.
Please leave any questions or comments below, and I would love to answer them.