Activation functions are mathematical equations that determine the output of a neural network, these functions are the functions that are used to calculate whether the neuron will be activated or not.
Role of the Activation Function in a Neural Network Model:
If activation functions are not applied, the output signal would be a linear function, which is a polynomial of one degree. While it is easy to solve linear equations, they have a limited complexity quotient and hence, have less power to learn complex functional mappings from data. Thus, without activation functions, a neural network would be a linear regression model with limited abilities.
Types of Activation Functions:
A binary step function is a threshold-based activation function. If the input value is above or below a certain threshold,the neuron is activated and sends exactly the same signal to the next layer. The binary step function can be used as an activation function while creating binary classifier.
The problem with a step function is that it does not allow multi-value outputs — for example, it cannot support classifying the inputs into one of several categories.
A linear activation function takes the form: A = cx
It takes the inputs, multiplied by the weights for each neuron, and creates an output signal proportional to the input. In one sense, a linear function is better than a step function because it allows multiple outputs, not just yes and no.
The sigmoid function is a non-linear Activation Function used primarily in feedforward neural networks. It is a differentiable real function, defined for real input values, and containing positive derivatives everywhere with a specific degree of smoothness. The sigmoid function appears in the output layer of the deep learning models and is used for predicting probability-based outputs.
The advantages of the sigmoid function are that it has smooth gradient, preventing “jumps” in output values, output values bound between 0 and 1, normalizing the output of each neuron and Clear predictions — For X above 2 or below -2, tends to bring the Y value (the prediction) to the edge of the curve, very close to 1 or 0. This enables clear predictions.
the disadvantages of the sigmoid function are that it has a vanishing gradient — for very high or very low values of X, like there is almost no change to the prediction, causing a vanishing gradient problem. This can result in the network refusing to learn further, or being too slow to reach an accurate prediction, the outputs not zero centered and it’s omputationally expensive .
Tanh is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1). tanh is also sigmoidal.
The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph.
5-ReLU (Rectified Linear Unit):
The ReLU is the most used activation function in the world right now.Since, it is used in almost all the convolutional neural networks or deep learning. It’s Computationally efficient — allows the network to converge very quickly and it’s Non-linear — although it looks like a linear function, ReLU has a derivative function and allows for backpropagation but when inputs approach zero, or are negative, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn.
Softmax function is often described as a combination of multiple sigmoids. We know that sigmoid returns values between 0 and 1, which can be treated as probabilities of a data point belonging to a particular class. Thus sigmoid is widely used for binary classification problems.
The softmax function can be used for multiclass classification problems. This function returns the probability for a datapoint belonging to each individual class.