{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Concise Implementation of Linear Regression\n",
"\n",
"The surge of deep learning has inspired the development \n",
"of a variety of mature software frameworks,\n",
"that automate much of the repetitive work \n",
"of implementing deep learning models.\n",
"In the previous section we relied only \n",
"on NDarray for data storage and linear algebra\n",
"and the auto-differentiation capabilities in the `autograd` package.\n",
"In practice, because many of the more abstract operations, e.g.\n",
"data iterators, loss functions, model architectures, and optimizers,\n",
"are so common, deep learning libraries will give us \n",
"library functions for these as well. \n",
"\n",
"In this section, we will introduce Gluon, MXNet's high-level interface\n",
"for implementing neural networks and show how we can implement \n",
"the linear regression model from the previous section much more concisely.\n",
"\n",
"## Generating Data Sets\n",
"\n",
"To start, we will generate the same data set as that used in the previous section."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "2"
}
},
"outputs": [],
"source": [
"from mxnet import autograd, nd\n",
"\n",
"num_inputs = 2\n",
"num_examples = 1000\n",
"true_w = nd.array([2, -3.4])\n",
"true_b = 4.2\n",
"features = nd.random.normal(scale=1, shape=(num_examples, num_inputs))\n",
"labels = nd.dot(features, true_w) + true_b\n",
"labels += nd.random.normal(scale=0.01, shape=labels.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading Data\n",
"\n",
"Rather than rolling our own iterator, \n",
"we can call upon Gluon's `data` module to read data. \n",
"Since `data` is often used as a variable name, \n",
"we will replace it with the pseudonym `gdata` \n",
"(adding the first letter of Gluon),\n",
"too differentiate the imported `data` module\n",
"from a variable we might define. \n",
"The first step will be to instantiate an `ArrayDataset`,\n",
"which takes in one or more NDArrays as arguments.\n",
"Here, we pass in `features` and `labels` as arguments.\n",
"Next, we will use the ArrayDataset to instantiate a DataLoader,\n",
"which also requires that we specify a `batch_size` \n",
"and specify a Boolean value `shuffle` indicating whether or not \n",
"we want the `DataLoader` to shuffle the data \n",
"on each epoch (pass through the dataset)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "3"
}
},
"outputs": [],
"source": [
"from mxnet.gluon import data as gdata\n",
"\n",
"batch_size = 10\n",
"# Combine the features and labels of the training data\n",
"dataset = gdata.ArrayDataset(features, labels)\n",
"# Randomly reading mini-batches\n",
"data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use `data_iter` in much the same way as we called the `data_iter` function in the previous section. To verify that it's working, we can read and print the first mini-batch of instances."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "5"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"[[ 0.8258905 1.0249989 ]\n",
" [-1.0929538 -0.1200345 ]\n",
" [-0.6205473 0.7588377 ]\n",
" [ 1.2163423 0.48574853]\n",
" [-1.3504591 0.417716 ]\n",
" [-0.3972993 1.4890484 ]\n",
" [-0.6613617 -0.1685903 ]\n",
" [-1.6575857 1.4283854 ]\n",
" [-0.4813231 0.5334126 ]\n",
" [ 1.6491317 -1.2902275 ]]\n",
" \n",
"[ 2.3662217 2.409014 0.3863677 4.9856253 0.08698601 -1.6439147\n",
" 3.4513123 -3.9651418 1.4169512 11.878293 ]\n",
"\n"
]
}
],
"source": [
"for X, y in data_iter:\n",
" print(X, y)\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the Model\n",
"\n",
"When we implemented linear regression from scratch in the previous section, we had to define the model parameters and explicitly write out the calculation to produce output using basic linear algebra opertions. You should know how to do this. But once your models get more complex, even qualitatively simple changes to the model might result in many low-level changes.\n",
"\n",
"For standard operations, we can use Gluon's predefined layers, which allow us to focus especially on the layers used to construct the model rather than having to focus on the implementation.\n",
"\n",
"To define a linear model, we first import the `nn` module,\n",
"which defines a large number of neural network layers\n",
"(note that \"nn\" is an abbreviation for neural networks). \n",
"We will first define a model variable `net`, which is a `Sequential` instance. In Gluon, a `Sequential` instance can be regarded as a container \n",
"that concatenates the various layers in sequence. \n",
"When input data is given, each layer in the container will be calculated in order, and the output of one layer will be the input of the next layer.\n",
"In this example, since our model consists of only one layer,\n",
"we do not really need `Sequential`.\n",
"But since nearly all of our future models will involve multiple layers, \n",
"let's get into the habit early."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "5"
}
},
"outputs": [],
"source": [
"from mxnet.gluon import nn\n",
"net = nn.Sequential()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recall the architecture of a single layer network. \n",
"The layer is fully connected since it connects all inputs \n",
"with all outputs by means of a matrix-vector multiplication. \n",
"In Gluon, the fully-connected layer is defined in the `Dense` class. \n",
"Since we only want to generate a single scalar output, \n",
"we set that number to $1$.\n",
"\n",
"![Linear regression is a single-layer neural network. ](../img/singleneuron.svg)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "6"
}
},
"outputs": [],
"source": [
"net.add(nn.Dense(1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is worth noting that, for convenience, \n",
"Gluon does not require us to specify \n",
"the input shape for each layer. \n",
"So here, we don't need to tell Gluon \n",
"how many inputs go into this linear layer.\n",
"When we first try to pass data through our model,\n",
"e.g., when we exedcute `net(X)` later, \n",
"Gluon will automatically infer the number of inputs to each layer. \n",
"We will describe how this works in more detail \n",
"in the chapter \"Deep Learning Computation\". \n",
"\n",
"\n",
"\n",
"## Initialize Model Parameters\n",
"\n",
"Before using `net`, we need to initialize the model parameters, \n",
"such as the weights and biases in the linear regression model. \n",
"We will import the `initializer` module from MXNet. \n",
"This module provides various methods for model parameter initialization. \n",
"Gluon makes `init` available as a shortcut (abbreviation) \n",
"to access the `initializer` package. \n",
"By calling `init.Normal(sigma=0.01)`, we specify that each *weight* parameter\n",
"should be randomly sampled from a normal distribution \n",
"with mean 0 and standard deviation 0.01. \n",
"The *bias* parameter will be initialized to zero by default."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "7"
}
},
"outputs": [],
"source": [
"from mxnet import init\n",
"net.initialize(init.Normal(sigma=0.01))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code above looks straightforward but in reality \n",
"something quite strange is happening here. \n",
"We are initializing parameters for a network\n",
"even though we haven't yet told Gluon how many dimensions the input will have. \n",
"It might be 2 as in our example or it might be 2,000, \n",
"so we couldn't just preallocate enough space to make it work.\n",
"\n",
"Gluon let's us get away with this because behind the scenes,\n",
"the initialization is deferred until the first time \n",
"that we attempt to pass data through our network. \n",
"Just be careful to remember that since the parameters \n",
"have not been initialized yet we cannot yet manipulate them in any way.\n",
"\n",
"\n",
"## Define the Loss Function\n",
"\n",
"In Gluon, the `loss` module defines various loss functions. \n",
"We will replace the imported module `loss` with the pseudonym `gloss`, \n",
"and directly use its implementation of squared loss (`L2Loss`)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "8"
}
},
"outputs": [],
"source": [
"from mxnet.gluon import loss as gloss\n",
"loss = gloss.L2Loss() # The squared loss is also known as the L2 norm loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the Optimization Algorithm\n",
"\n",
"Not surpisingly, we aren't the first people \n",
"to implement mini-batch stochastic gradient descent,\n",
"and thus `Gluon` supports SGD alongside a number of \n",
"variations on this algorithm through its `Trainer` class. \n",
"When we instantiate the `Trainer`, we'll specify the parameters to optimize over (obtainable from our net via `net.collect_params()`),\n",
"the optimization algortihm we wish to use (`sgd`),\n",
"and a dictionary of hyper-parameters required by our optimization algorithm.\n",
"SGD just requires that we set the value `learning_rate`, \n",
"(here we set it to 0.03)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "9"
}
},
"outputs": [],
"source": [
"from mxnet import gluon\n",
"trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"\n",
"You might have noticed that expressing our model through Gluon\n",
"requires comparatively few lines of code. \n",
"We didn't have to individually allocate parameters, \n",
"define our loss function, or implement stochastic gradient descent. \n",
"Once we start working with much more complex models,\n",
"the benefits of relying on Gluon's abstractions will grow considerably. \n",
"But once we have all the basic pieces in place, \n",
"the training loop itself is strikingly similar \n",
"to what we did when implementing everything from scratch.\n",
"\n",
"To refresh your memory: for some number of epochs, \n",
"we'll make a complete pass over the dataset (train_data), \n",
"grabbing one mini-batch of inputs and corresponding ground-truth labels at a time. For each batch, we'll go through the following ritual:\n",
"\n",
"* Generate predictions by calling `net(X)` and calculate the loss `l` (the forward pass).\n",
"* Calculate gradients by calling `l.backward()` (the backward pass).\n",
"* Update the model parameters by invoking our SGD optimizer (note that `trainer` already knows which parameters to optimize over, so we just need to pass in the batch size.\n",
"\n",
"For good measure, we compute the loss after each epoch and print it to monitor progress."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "10"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 1, loss: 0.040614\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 2, loss: 0.000155\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 3, loss: 0.000051\n"
]
}
],
"source": [
"num_epochs = 3\n",
"for epoch in range(1, num_epochs + 1):\n",
" for X, y in data_iter:\n",
" with autograd.record():\n",
" l = loss(net(X), y)\n",
" l.backward()\n",
" trainer.step(batch_size)\n",
" l = loss(net(features), labels)\n",
" print('epoch %d, loss: %f' % (epoch, l.mean().asnumpy()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model parameters we have learned and the actual model parameters are compared as below. We get the layer we need from the `net` and access its weight (`weight`) and bias (`bias`). The parameters we have learned and the actual parameters are very close."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "12"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Error in estimating w \n",
"[[ 6.2358379e-04 -4.6014786e-05]]\n",
"\n",
"Error in estimating b \n",
"[0.00088549]\n",
"\n"
]
}
],
"source": [
"w = net[0].weight.data()\n",
"print('Error in estimating w', true_w.reshape(w.shape) - w)\n",
"b = net[0].bias.data()\n",
"print('Error in estimating b', true_b - b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"* Using Gluon, we can implement the model more succinctly.\n",
"* In Gluon, the module `data` provides tools for data processing, the module `nn` defines a large number of neural network layers, and the module `loss` defines various loss functions.\n",
"* MXNet's module `initializer` provides various methods for model parameter initialization.\n",
"* Dimensionality and storage are automagically inferred (but caution if you want to access parameters before they've been initialized).\n",
"\n",
"\n",
"## Exercises\n",
"\n",
"1. If we replace `l = loss(output, y)` with `l = loss(output, y).mean()`, we need to change `trainer.step(batch_size)` to `trainer.step(1)` accordingly. Why?\n",
"1. Review the MXNet documentation to see what loss functions and initialization methods are provided in the modules `gluon.loss` and `init`. Replace the loss by Huber's loss.\n",
"1. How do you access the gradient of `dense.weight`?\n",
"\n",
"## Scan the QR Code to [Discuss](https://discuss.mxnet.io/t/2333)\n",
"\n",
"![](../img/qr_linear-regression-gluon.svg)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}