The Simplest Neural Network
Step 1 of building a deep learning neural network from scratch
Goal and Purpose
Linear regression is the simplest and arguably most well-known form of machine learning.
Linear regression models, as in statistics, are concerned with minimizing error and making the most accurate predictions possible. A neural network can accomplish this by iteratively updating its internal parameters (weights) via a gradient descent algorithm. The Youtube educator 3B1B has a great visualization of gradient descent in the context of machine learning models.
In this notebook, I will develop a simple single-layer model to “learn” how to predict the outcome of a linear function of the form y=mx+b. This exercise may seem trivial or necessary (“I can produce a list of solutions for a linear function and plot it with a few lines of code, why do I need a neural network to try and learn how to do the same thing?”) but this will act as a backbone to build more complex neural networks for much much more complicated functions.
We will start with the simplest possible neural network, one input neuron and one output neuron with which we will learn how to create linear functions. Then, we will expand upon this simple model by allowing for multidimensional inputs (and thereby allowing for multidimensional outputs as well). Finally, we will complete our foray into Deep Learning by adding “hidden layers” to our neural network. The result will be a modular Deep Learning model that can be easily applied to a diverse set of problems.
The Network of Neurons
A neural network can be thought of as a function: input, transformation, and output. Therefore, in its most simple representation, a neural network can take a single input, produce an output, and by comparing that output with the known result, can update its internals to better approach the correct output value.
Here, x is some input number, the input is transformed via our neural network function which has parameters W and b (weight and bias). These parameters are subject to change based on how erroneous the network’s output y-hat compared to the actual value we’d expect from input x.
The simple neural network has the following steps:
- Initialize training input
x_train
and outputy_train
. The output here is the expected correct answer. - Initialize network parameters
W
andb
. Here the weight array must correspond to the number of inputs. Since we only feed in one input at a time for now, the weights and bias arrays will have shape (1,1). The weight is initialized to a small random number. - Define our
cost
function. The “cost” can be thought of as the error between the expected output and our network’s output. “Cost” and “Loss” are similar, though I believe the Loss function is the averaged error when considering a multitude of simultaneous inputs. We’ll showcase this later, for now, each error calculation is referred to as the cost. - Calculate the components of the gradient of the
cost
function. In this case: ẟC/ẟW and ẟC/ẟb. - Update the network parameters by reducing by a scaled amount of the gradient components. This is gradient descent.
- Repeat this process any number of times, called epochs. Return the parameters
W
andb
. - Use the model’s updated parameters on test data to determine how accurate the trained model is.
True equation:
y = 5.0x + 3.0
Our learned equation is:
y = 5.03x + 2.53
Testing for x = 5.0:
Model result, actual: 27.69, 28.0
Model error: 1.11%
Wonderful! By just feeding our neural network the same number over and over again (5 times in total), we were able to train the network to respond to any other number to within ~1% error.
In Neural Network models, we refer to the weights and biases of the model as the model’s parameters. These are elements that change throughout the model’s learning process but that we do not necessarily have direct control over. Number of learning iterations (epochs) and the learning rate (which scales by how much parameters W and b are changed) are examples of hyperparameters. We typically DO have direct control over hyperparameters. Below, we can see the effect of a decreased learning rate and increasing epoch number.
Typically, we’d expect more epoch iterations to lend to less model error since it gives more iterations to improve the model parameters. But this can also have a detrimental effect as the model becomes “over-trained”. The learning rate can also be over-tuned, if the learning rate is too large, the model will bounce around its cost/loss minimum without ever “descending the gradient”. If the learning rate is too small, the model will learn very accurately but very slowly, requiring many more epoch iterations to get equivalent results to larger learning rates. As can be seen in the figure below, more epochs can sometimes cause a model’s error to rise while smaller learning rates take more epochs to improve.
Next Steps
In order for a machine learning model to become a “deep” model, layers need to be added. These layers (a.k.a. hidden layers) are additional linear transformations that have the potential to unlock more information for a model to learn from. The addition of hidden layers also forces us to contend with backpropagation, an algorithm that calculates the changes in parameter values in a complex neural network. This is the goal for next time, implement a neural network with a single hidden layer and perform gradient descent using backpropagation. See you next time!