.. code:: python
from mxnet import autograd, gluon, np, npx
from d2l import mxnet as d2l
npx.set_np()
true_w = np.array([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)
.. raw:: html

.. raw:: html
.. code:: python
import numpy as np
import torch
from torch.utils import data
from d2l import torch as d2l
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)
.. raw:: html

.. raw:: html
.. code:: python
import numpy as np
import tensorflow as tf
from d2l import tensorflow as d2l
true_w = tf.constant([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)
.. raw:: html

.. raw:: html
.. code:: python
def load_array(data_arrays, batch_size, is_train=True): #@save
"""Construct a Gluon data iterator."""
dataset = gluon.data.ArrayDataset(*data_arrays)
return gluon.data.DataLoader(dataset, batch_size, shuffle=is_train)
batch_size = 10
data_iter = load_array((features, labels), batch_size)
.. raw:: html

.. raw:: html
.. code:: python
def load_array(data_arrays, batch_size, is_train=True): #@save
"""Construct a PyTorch data iterator."""
dataset = data.TensorDataset(*data_arrays)
return data.DataLoader(dataset, batch_size, shuffle=is_train)
batch_size = 10
data_iter = load_array((features, labels), batch_size)
.. raw:: html

.. raw:: html
.. code:: python
def load_array(data_arrays, batch_size, is_train=True): #@save
"""Construct a TensorFlow data iterator."""
dataset = tf.data.Dataset.from_tensor_slices(data_arrays)
if is_train:
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(batch_size)
return dataset
batch_size = 10
data_iter = load_array((features, labels), batch_size)
.. raw:: html

.. raw:: html
.. code:: python
next(iter(data_iter))
.. parsed-literal::
:class: output
[array([[-0.36800992, 0.22461908],
[ 0.67585623, 0.2711922 ],
[ 1.5418535 , -1.7995142 ],
[-0.05390022, 0.14647691],
[-1.9153563 , 0.91815436],
[ 1.0210714 , -2.3413255 ],
[ 0.14528377, 0.65812886],
[ 1.9371428 , -0.64002866],
[-0.01646438, 0.26466823],
[-0.24186634, -0.06476639]]),
array([[ 2.6932518],
[ 4.63964 ],
[13.404824 ],
[ 3.6137338],
[-2.7596586],
[14.190978 ],
[ 2.249524 ],
[10.239091 ],
[ 3.2804892],
[ 3.9296224]])]
.. raw:: html

.. raw:: html
.. code:: python
next(iter(data_iter))
.. parsed-literal::
:class: output
[tensor([[-1.6610, -0.5289],
[-0.3918, -0.0276],
[ 0.2443, 0.6757],
[-0.4833, -0.8065],
[-0.0752, 1.2373],
[-0.8484, -0.6179],
[-0.1420, 2.3129],
[ 1.7178, -1.4824],
[ 0.1321, 0.3060],
[-1.3929, 1.5900]]),
tensor([[ 2.6880],
[ 3.5115],
[ 2.4161],
[ 5.9977],
[-0.1496],
[ 4.6066],
[-3.9684],
[12.6807],
[ 3.4272],
[-3.9710]])]
.. raw:: html

.. raw:: html
.. code:: python
next(iter(data_iter))
.. parsed-literal::
:class: output
(,
)
.. raw:: html

.. raw:: html
In Gluon, the fully-connected layer is defined in the ``Dense`` class.
Since we only want to generate a single scalar output, we set that
number to 1.
It is worth noting that, for convenience, Gluon does not require us to
specify the input shape for each layer. So here, we do not need to tell
Gluon how many inputs go into this linear layer. When we first try to
pass data through our model, e.g., when we execute ``net(X)`` later,
Gluon will automatically infer the number of inputs to each layer. We
will describe how this works in more detail later.
.. code:: python
# `nn` is an abbreviation for neural networks
from mxnet.gluon import nn
net = nn.Sequential()
net.add(nn.Dense(1))
.. raw:: html

.. raw:: html
In PyTorch, the fully-connected layer is defined in the ``Linear``
class. Note that we passed two arguments into ``nn.Linear``. The first
one specifies the input feature dimension, which is 2, and the second
one is the output feature dimension, which is a single scalar and
therefore 1.
.. code:: python
# `nn` is an abbreviation for neural networks
from torch import nn
net = nn.Sequential(nn.Linear(2, 1))
.. raw:: html

.. raw:: html
In Keras, the fully-connected layer is defined in the ``Dense`` class.
Since we only want to generate a single scalar output, we set that
number to 1.
It is worth noting that, for convenience, Keras does not require us to
specify the input shape for each layer. So here, we do not need to tell
Keras how many inputs go into this linear layer. When we first try to
pass data through our model, e.g., when we execute ``net(X)`` later,
Keras will automatically infer the number of inputs to each layer. We
will describe how this works in more detail later.
.. code:: python
# `keras` is the high-level API for TensorFlow
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1))
.. raw:: html

.. raw:: html
We will import the ``initializer`` module from MXNet. This module
provides various methods for model parameter initialization. Gluon makes
``init`` available as a shortcut (abbreviation) to access the
``initializer`` package. We only specify how to initialize the weight by
calling ``init.Normal(sigma=0.01)``. Bias parameters are initialized to
zero by default.
.. code:: python
from mxnet import init
net.initialize(init.Normal(sigma=0.01))
The code above may look straightforward but you should note that
something strange is happening here. We are initializing parameters for
a network even though Gluon does not yet know how many dimensions the
input will have! It might be 2 as in our example or it might be 2000.
Gluon lets us get away with this because behind the scene, the
initialization is actually *deferred*. The real initialization will take
place only when we for the first time attempt to pass data through the
network. Just be careful to remember that since the parameters have not
been initialized yet, we cannot access or manipulate them.
.. raw:: html

.. raw:: html
As we have specified the input and output dimensions when constructing
``nn.Linear``. Now we access the parameters directly to specify their
initial values. We first locate the layer by ``net[0]``, which is the
first layer in the network, and then use the ``weight.data`` and
``bias.data`` methods to access the parameters. Next we use the replace
methods ``normal_`` and ``fill_`` to overwrite parameter values.
.. code:: python
net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)
.. parsed-literal::
:class: output
tensor([0.])
.. raw:: html

.. raw:: html
The ``initializers`` module in TensorFlow provides various methods for
model parameter initialization. The easiest way to specify the
initialization method in Keras is when creating the layer by specifying
``kernel_initializer``. Here we recreate ``net`` again.
.. code:: python
initializer = tf.initializers.RandomNormal(stddev=0.01)
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1, kernel_initializer=initializer))
The code above may look straightforward but you should note that
something strange is happening here. We are initializing parameters for
a network even though Keras does not yet know how many dimensions the
input will have! It might be 2 as in our example or it might be 2000.
Keras lets us get away with this because behind the scenes, the
initialization is actually *deferred*. The real initialization will take
place only when we for the first time attempt to pass data through the
network. Just be careful to remember that since the parameters have not
been initialized yet, we cannot access or manipulate them.
.. raw:: html

.. raw:: html
In Gluon, the ``loss`` module defines various loss functions. In this
example, we will use the Gluon implementation of squared loss
(``L2Loss``).
.. code:: python
loss = gluon.loss.L2Loss()
.. raw:: html

.. raw:: html
The ``MSELoss`` class computes the mean squared error, also known as
squared :math:`L_2` norm. By default it returns the average loss over
examples.
.. code:: python
loss = nn.MSELoss()
.. raw:: html

.. raw:: html
The ``MeanSquaredError`` class computes the mean squared error, also
known as squared :math:`L_2` norm. By default it returns the average
loss over examples.
.. code:: python
loss = tf.keras.losses.MeanSquaredError()
.. raw:: html

.. raw:: html
Minibatch stochastic gradient descent is a standard tool for optimizing
neural networks and thus Gluon supports it alongside a number of
variations on this algorithm through its ``Trainer`` class. When we
instantiate ``Trainer``, we will specify the parameters to optimize over
(obtainable from our model ``net`` via ``net.collect_params()``), the
optimization algorithm we wish to use (``sgd``), and a dictionary of
hyperparameters required by our optimization algorithm. Minibatch
stochastic gradient descent just requires that we set the value
``learning_rate``, which is set to 0.03 here.
.. code:: python
from mxnet import gluon
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})
.. raw:: html

.. raw:: html
Minibatch stochastic gradient descent is a standard tool for optimizing
neural networks and thus PyTorch supports it alongside a number of
variations on this algorithm in the ``optim`` module. When we
instantiate an ``SGD`` instance, we will specify the parameters to
optimize over (obtainable from our net via ``net.parameters()``), with a
dictionary of hyperparameters required by our optimization algorithm.
Minibatch stochastic gradient descent just requires that we set the
value ``lr``, which is set to 0.03 here.
.. code:: python
trainer = torch.optim.SGD(net.parameters(), lr=0.03)
.. raw:: html

.. raw:: html
Minibatch stochastic gradient descent is a standard tool for optimizing
neural networks and thus Keras supports it alongside a number of
variations on this algorithm in the ``optimizers`` module. Minibatch
stochastic gradient descent just requires that we set the value
``learning_rate``, which is set to 0.03 here.
.. code:: python
trainer = tf.keras.optimizers.SGD(learning_rate=0.03)
.. raw:: html

.. raw:: html
.. code:: python
num_epochs = 3
for epoch in range(num_epochs):
for X, y in data_iter:
with autograd.record():
l = loss(net(X), y)
l.backward()
trainer.step(batch_size)
l = loss(net(features), labels)
print(f'epoch {epoch + 1}, loss {l.mean().asnumpy():f}')
.. parsed-literal::
:class: output
epoch 1, loss 0.024966
epoch 2, loss 0.000091
epoch 3, loss 0.000051
.. raw:: html

.. raw:: html
.. code:: python
num_epochs = 3
for epoch in range(num_epochs):
for X, y in data_iter:
l = loss(net(X), y)
trainer.zero_grad()
l.backward()
trainer.step()
l = loss(net(features), labels)
print(f'epoch {epoch + 1}, loss {l:f}')
.. parsed-literal::
:class: output
epoch 1, loss 0.000332
epoch 2, loss 0.000105
epoch 3, loss 0.000104
.. raw:: html

.. raw:: html
.. code:: python
num_epochs = 3
for epoch in range(num_epochs):
for X, y in data_iter:
with tf.GradientTape() as tape:
l = loss(net(X, training=True), y)
grads = tape.gradient(l, net.trainable_variables)
trainer.apply_gradients(zip(grads, net.trainable_variables))
l = loss(net(features), labels)
print(f'epoch {epoch + 1}, loss {l:f}')
.. parsed-literal::
:class: output
epoch 1, loss 0.000215
epoch 2, loss 0.000105
epoch 3, loss 0.000106
.. raw:: html

.. raw:: html
.. code:: python
w = net[0].weight.data()
print(f'error in estimating w: {true_w - w.reshape(true_w.shape)}')
b = net[0].bias.data()
print(f'error in estimating b: {true_b - b}')
.. parsed-literal::
:class: output
error in estimating w: [0.00043285 0.00016856]
error in estimating b: [0.0004735]
.. raw:: html

.. raw:: html
.. code:: python
w = net[0].weight.data
print('error in estimating w:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('error in estimating b:', true_b - b)
.. parsed-literal::
:class: output
error in estimating w: tensor([0.0004, 0.0007])
error in estimating b: tensor([0.0004])
.. raw:: html

.. raw:: html
.. code:: python
w = net.get_weights()[0]
print('error in estimating w', true_w - tf.reshape(w, true_w.shape))
b = net.get_weights()[1]
print('error in estimating b', true_b - b)
.. parsed-literal::
:class: output
error in estimating w tf.Tensor([-0.00059867 0.00098014], shape=(2,), dtype=float32)
error in estimating b [-0.00029993]
.. raw:: html

.. raw:: html
- Using Gluon, we can implement models much more concisely.
- In Gluon, the ``data`` module provides tools for data processing, the
``nn`` module defines a large number of neural network layers, and
the ``loss`` module defines many common loss functions.
- MXNet’s module ``initializer`` provides various methods for model
parameter initialization.
- Dimensionality and storage are automatically inferred, but be careful
not to attempt to access parameters before they have been
initialized.
.. raw:: html

.. raw:: html
- Using PyTorch’s high-level APIs, we can implement models much more
concisely.
- In PyTorch, the ``data`` module provides tools for data processing,
the ``nn`` module defines a large number of neural network layers and
common loss functions.
- We can initialize the parameters by replacing their values with
methods ending with ``_``.
.. raw:: html

.. raw:: html
- Using TensorFlow’s high-level APIs, we can implement models much more
concisely.
- In TensorFlow, the ``data`` module provides tools for data
processing, the ``keras`` module defines a large number of neural
network layers and common loss functions.
- TensorFlow’s module ``initializers`` provides various methods for
model parameter initialization.
- Dimensionality and storage are automatically inferred (but be careful
not to attempt to access parameters before they have been
initialized).
.. raw:: html

.. raw:: html
1. If we replace ``l = loss(output, y)`` with
``l = loss(output, y).mean()``, we need to change
``trainer.step(batch_size)`` to ``trainer.step(1)`` for the code to
behave identically. Why?
2. Review the MXNet documentation to see what loss functions and
initialization methods are provided in the modules ``gluon.loss`` and
``init``. Replace the loss by Huber’s loss.
3. How do you access the gradient of ``dense.weight``?
`Discussions `__
.. raw:: html

.. raw:: html
1. If we replace ``nn.MSELoss(reduction='sum')`` with ``nn.MSELoss()``,
how can we change the learning rate for the code to behave
identically. Why?
2. Review the PyTorch documentation to see what loss functions and
initialization methods are provided. Replace the loss by Huber’s
loss.
3. How do you access the gradient of ``net[0].weight``?
`Discussions `__
.. raw:: html

.. raw:: html
1. Review the TensorFlow documentation to see what loss functions and
initialization methods are provided. Replace the loss by Huber’s
loss.
`Discussions `__
.. raw:: html

.. raw:: html