**Overview**

The idea of neural network algorithm came as an attempt to mimic the brain and its amazing ability to learn. Although it is a relatively old idea (emerged around the years 80-90), nowadays neural networks is considered the state of the art in many applications.

Neural network algorithms are based on the hypothesis that the brain has only one algorithm that can learn all features of the body, i.e., any area of the brain could learn to see or hear if it received the appropriate stimulus.

**Representation**

In the brain, each neuron receives nerve impulses through dendrites, performs some "calculation" in cell body and transmits the response via another nerve impulse using the axon. A neural network algorithm copies this system, as shown in Figures 1 and 2 below.

**Figures 1 and 2 - representation of a neuron (left) and a neural network unit (right).**

In this type of algorithm, several "neurons" are interconnected to form a network. This network consists of three types of layers, known as input layer, output layer and hidden layers, as shown in Figure 3 below.

**Figure 3 - representation of a neural network.**

The input layer receives the data and the output layers outputs the response. The hidden layers are responsible for some intermediate calculation that helps the network to find the final answer. In more complex networks, one can use multiple hidden layers between the input and the output layers. The number of neurons in each layer depends on the amount of input data and the type of problem being solved. For example, if the algorithm was designed to determine whether or not a patient has a specific disease, the input layer has as many neurons as the number of features in the input data and the output layer has only one neuron.

**Example: Forward propagation**

In an application in healthcare, one may wish to determine whether a tumor is malignant or benign based on its characteristics. In this case, the neural network only needs a single neuron in the output layer that will output the value 1 if the tumor is malignant and the value 0 if it is benign. An example of a neural network that can be used in this situation is shown in Figure 4 below.

**Figure 4 - example of neural network.**

In this case, the data has three characteristics of the tumor which are the input of the algorithm, plus an extra neuron called bias unit that always outputs the value 1. This extra neuron can be added to every layer, with exception of the output layer, and it helps the network to better fit the data.

The idea of this algorithm is to find the answer, i.e., the value of , performing a forward propagation from the input layer to the output layer. For this, it is necessary to define some variables also known as weights matrices, where l refers to the layer being treated. The values stored in these matrices are obtained during the algorithm training and represent the weight of each neuron in the value of the neurons on the next layer.

Since this example is a classification problem with only two possible output values (0 or 1), it is possible to use the Sigmoid function as the activation function. This function, whose graph is shown below, outputs the value 0 if the input values are very negative numbers and outputs the value 1 if they are very positive numbers.

**Figure 5 - Sigmoid function graph.**

The propagation process starts in the input layer where each neuron receives the value of one data about the tumor, i.e., they assume the values x_{1}, x_{2} and x_{3}, where x is vector that contains all the data about the tumor. The second step is to calculate the values of the neurons in the hidden layer using the values in the matrix .

where a_{1}^{(2)}, a_{2}^{(2)} and a_{3}^{(2)} represent the values of the neurons in the hidden layer.

Lastly, it is necessary to calculate the output value, i.e., .

Therefore, the value of will be 0 or 1, depending on the tumor characteristics specified in the variables x_{1}, x_{2} and x_{3}.

**Algorithm training**

The goal of training an algorithm is to find the values of the matrices that cause it output the correct values. For this, it is necessary to collect many complete examples of tumors, i.e., not only their characteristics but also whether they malignant or benign.

One efficient way to calculate the weight matrices is to use a function called fmincg. This function takes as input the following data:

- Cost function - ;
- Derivative of the cost function;
- Input data (training set).

The cost function varies according to each problem and its goal is to calculate the error made by the algorithm. Many different cost functions have already been defined by mathematicians and can be used in different situations.

In order to calculate the derivative of the cost function, one should use a process called Back propagation, whose goal is to calculate the error of each neuron in each example of the training set, i.e., what is the difference between the real value of the neuron and the correct value it should take. This error is represented by the greek letter lower case delta .

Firstly, it is necessary to initialize the matrices with random values, wherein l ranges from 1 to L - 1 and L is the total number of layers in the network. In this example, one should use two matrices: and .

Once the matrices are initialized, for each example of the training set one should perform some steps:

1. Forward propagation

2. Neurons errors

For the output layer, the error corresponds to the difference between the value calculated by the network and the real value:

For the hidden layer, this error corresponds to a propagation of the output error. In this example, they can be calculated by the formula below:

3. Derivative of the cost function

For each example in the training set, it is possible to prove that the derivative of the cost function is equivalent to .

Taking in to consideration that the training set has many examples, it was proved that the total derivative of the cost function is equal to the average of of each example.

As all the inputs required by the function fmincg were found, the next step is to apply the function and obtain the values of and .

**Conclusion**

Although it is a more complex and difficult to understand model of algorithm, neural networks are very useful and versatile. Changing only the input data, the cost function and the activation function it is possible to solve many different problems by finding the appropriate values of .