# 3.3. Concise Implementation of Linear Regression¶

With the development of deep learning frameworks, it has become increasingly easy to develop deep learning applications. In practice, we can usually implement the same model, but much more concisely than how we introduce it in the previous section. In this section, we will introduce how to use the Gluon interface provided by MXNet.

## 3.3.1. Generating Data Sets¶

We will generate the same data set as that used in the previous section.

In [1]:

from mxnet import autograd, nd

num_inputs = 2
num_examples = 1000
true_w = nd.array([2, -3.4])
true_b = 4.2
features = nd.random.normal(scale=1, shape=(num_examples, num_inputs))
labels = nd.dot(features, true_w) + true_b
labels += nd.random.normal(scale=0.01, shape=labels.shape)


Gluon provides the data module to read data. Since data is often used as a variable name, we will replace it with the pseudonym gdata (adding the first letter of Gluon) when referring to the imported data module. In each iteration, we will randomly read a mini-batch containing 10 data instances.

In [2]:

from mxnet.gluon import data as gdata

batch_size = 10
# Combine the features and labels of the training data
dataset = gdata.ArrayDataset(features, labels)


The use of data_iter here is the same as in the previous section. Now, we can read and print the first mini-batch of instances.

In [3]:

for X, y in data_iter:
print(X, y)
break


[[ 1.9782461  -2.0905404 ]
[ 1.7928383  -0.09992757]
[-0.8624608  -0.4348689 ]
[-0.27312678  0.45424792]
[-0.46069986  0.8082894 ]
[-0.8634535  -0.97035617]
[ 0.01487677 -0.12173855]
[-1.2860336  -1.6586353 ]
[ 0.52979773 -2.436552  ]
[ 0.44809967  0.00624379]]
<NDArray 10x2 @cpu(0)>
[15.254026    8.135473    3.9542496   2.1196961   0.51967335  5.750222
4.648376    7.265286   13.54595     5.088348  ]
<NDArray 10 @cpu(0)>


## 3.3.3. Define the Model¶

When we implemented the linear regression model from scratch in the previous section, we needed to define the model parameters and use them to describe step by step how the model is evaluated. This can become complicated as we build complex models. Gluon provides a large number of predefined layers, which allow us to focus especially on the layers used to construct the model rather than having to focus on the implementation.

To define a linear model, first import the module nn. nn is an abbreviation for neural networks. As the name implies, this module defines a large number of neural network layers. We will first define a model variable net, which is a Sequential instance. In Gluon, a Sequential instance can be regarded as a container that concatenates the various layers in sequence. When constructing the model, we will add the layers in their order of occurrence in the container. When input data is given, each layer in the container will be calculated in order, and the output of one layer will be the input of the next layer.

In [4]:

from mxnet.gluon import nn
net = nn.Sequential()


Recall the architecture of a single layer network. The layer is fully connected since it connects all inputs with all outputs by means of a matrix-vector multiplication. In Gluon, the fully connected layer is referred to as a Dense instance. Since we only want to generate a single scalar output, we set that number to $$1$$.

In [5]:

net.add(nn.Dense(1))


It is worth noting that, in Gluon, we do not need to specify the input shape for each layer, such as the number of linear regression inputs. When the model sees the data, for example, when the net(X) is executed later, the model will automatically infer the number of inputs in each layer. We will describe this mechanism in detail in the chapter “Deep Learning Computation”. Gluon introduces this design to make model development more convenient.

## 3.3.4. Initialize Model Parameters¶

Before using net, we need to initialize the model parameters, such as the weights and biases in the linear regression model. We will import the initializer module from MXNet. This module provides various methods for model parameter initialization. The init here is the abbreviation of initializer. Byinit.Normal(sigma=0.01) we specify that each weight parameter element is to be randomly sampled at initialization with a normal distribution with a mean of 0 and standard deviation of 0.01. The bias parameter will be initialized to zero by default.

In [6]:

from mxnet import init
net.initialize(init.Normal(sigma=0.01))


The code above looks pretty straightforward but in reality something quite strange is happening here. We are initializing parameters for a networks where we haven’t told Gluon yet how many dimensions the input will have. It might be 2 as in our example or 2,000, so we couldn’t just preallocate enough space to make it work. What happens behind the scenes is that the updates are deferred until the first time that data is sent through the networks. In doing so, we prime all settings (and the user doesn’t even need to worry about it). The only cautionary notice is that since the parameters have not been initialized yet, we would not be able to manipulate them yet.

## 3.3.5. Define the Loss Function¶

In Gluon, the module loss defines various loss functions. We will replace the imported module loss with the pseudonym gloss, and directly use the squared loss it provides as a loss function for the model.

In [7]:

from mxnet.gluon import loss as gloss
loss = gloss.L2Loss()  # The squared loss is also known as the L2 norm loss


## 3.3.6. Define the Optimization Algorithm¶

Again, we do not need to implement mini-batch stochastic gradient descent. After importing Gluon, we now create a Trainer instance and specify a mini-batch stochastic gradient descent with a learning rate of 0.03 (sgd) as the optimization algorithm. This optimization algorithm will be used to iterate through all the parameters contained in the net instance’s nested layers through the add function. These parameters can be obtained by the collect_params function.

In [8]:

from mxnet import gluon
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})


## 3.3.7. Training¶

You might have noticed that it was a bit more concise to express our model in Gluon. For example, we didn’t have to individually allocate parameters, define our loss function, or implement stochastic gradient descent. The benefits of relying on Gluon’s abstractions will grow substantially once we start working with much more complex models. But once we have all the basic pieces in place, the training loop itself is quite similar to what we would do if implementing everything from scratch.

To refresh your memory. For some number of epochs, we’ll make a complete pass over the dataset (train_data), grabbing one mini-batch of inputs and the corresponding ground-truth labels at a time. Then, for each batch, we’ll go through the following ritual.

• Generate predictions net(X) and the loss l by executing a forward pass through the network.
• Calculate gradients by making a backwards pass through the network via l.backward().
• Update the model parameters by invoking our SGD optimizer (note that we need not tell trainer.step about which parameters but rather just the amount of data, since we already performed that in the initialization of trainer).

For good measure we compute the loss on the features after each epoch and print it to monitor progress.

In [9]:

num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
l = loss(net(X), y)
l.backward()
trainer.step(batch_size)
l = loss(net(features), labels)
print('epoch %d, loss: %f' % (epoch, l.mean().asnumpy()))

epoch 1, loss: 0.040886
epoch 2, loss: 0.000161
epoch 3, loss: 0.000050


The model parameters we have learned and the actual model parameters are compared as below. We get the layer we need from the net and access its weight (weight) and bias (bias). The parameters we have learned and the actual parameters are very close.

In [10]:

w = net[0].weight.data()
print('Error in estimating w', true_w.reshape(w.shape) - w)
b = net[0].bias.data()
print('Error in estimating b', true_b - b)

Error in estimating w
[[ 0.00027549 -0.00047135]]
<NDArray 1x2 @cpu(0)>
Error in estimating b
[0.00035286]
<NDArray 1 @cpu(0)>


## 3.3.8. Summary¶

• Using Gluon, we can implement the model more succinctly.
• In Gluon, the module data provides tools for data processing, the module nn defines a large number of neural network layers, and the module loss defines various loss functions.
• MXNet’s module initializer provides various methods for model parameter initialization.
• Dimensionality and storage are automagically inferred (but caution if you want to access parameters before they’ve been initialized).

## 3.3.9. Exercises¶

1. If we replace l = loss(output, y) with l = loss(output, y).mean(), we need to change trainer.step(batch_size) to trainer.step(1) accordingly. Why?
2. Review the MXNet documentation to see what loss functions and initialization methods are provided in the modules gluon.loss and init. Replace the loss by Huber’s loss.
3. How do you access the gradient of dense.weight?