{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Concise Implementation of Recurrent Neural Networks\n",
"\n",
"While the previous section was instructive to see how recurrent neural networks are implemented, this isn't convenient or fast. The current section will show how to implement the same language model more efficiently using functions provided by the deep learning framework. We begin as before by reading the 'Time Machine' corpus."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "1"
}
},
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"\n",
"import d2l\n",
"import math\n",
"from mxnet import autograd, gluon, init, nd\n",
"from mxnet.gluon import loss as gloss, nn, rnn\n",
"import time\n",
"\n",
"corpus_indices, vocab = d2l.load_data_time_machine()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining the Model\n",
"\n",
"Gluon's `rnn` module provides a recurrent neural network implementation (beyond many other sequence models). We construct the recurrent neural network layer `rnn_layer` with a single hidden layer and 256 hidden units, and initialize the weights."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "26"
}
},
"outputs": [],
"source": [
"num_hiddens = 256\n",
"rnn_layer = rnn.RNN(num_hiddens)\n",
"rnn_layer.initialize()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initializing the state is straightforward. We invoke the member function `rnn_layer.begin_state(batch_size)`. This returns an initial state for each element in the minibatch. That is, it returns an object that is of size (hidden layers, batch size, number of hidden units). The number of hidden layers defaults to 1. In fact, we haven't even discussed yet what it means to have multiple layers - this will happen [later](deep-rnn.md). For now, suffice it to say that multiple layers simply amount to the output of one RNN being used as the input for the next RNN."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "37"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(1, 2, 256)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"batch_size = 2\n",
"state = rnn_layer.begin_state(batch_size=batch_size)\n",
"state[0].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike the recurrent neural network implemented in the previous section, the input shape of `rnn_layer` is given by (time step, batch size, number of inputs). In the case of a language model the number of inputs would be the one-hot vector length (the dictionary size). In addition, as an `rnn.RNN` instance in Gluon, `rnn_layer` returns the output and hidden state after forward computation. The output refers to the sequence of hidden states that the RNN computes over various time steps. They are used as input for subsequent output layers. Note that the output does not involve any conversion to characters or any other post-processing. This is so, since the RNN itself has no concept of what to do with the vectors that it generates. In short, its shape is given by (time step, batch size, number of hidden units). \n",
"\n",
"The hidden state returned by the `rnn.RNN` instance in the forward computation is the state of the hidden layer available at the last time step. This can be used to initialize the next time step: when there are multiple layers in the hidden layer, the hidden state of each layer is recorded in this variable. For recurrent neural networks such as [Long Short Term Memory](lstm.md) (LSTM) networks, the variables also contains other state information. We will introduce LSTM and deep RNNs later in this chapter."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "38"
}
},
"outputs": [
{
"data": {
"text/plain": [
"((35, 2, 256), 1, (1, 2, 256))"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"num_steps = 35\n",
"X = nd.random.uniform(shape=(num_steps, batch_size, len(vocab)))\n",
"Y, state_new = rnn_layer(X, state)\n",
"Y.shape, len(state_new), state_new[0].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, define an `RNNModel` block by subclassing the `Block` class to define a complete recurrent neural network. It first uses one-hot vector embeddings to represent input data and enter it into the `rnn_layer`. This is then used by the fully connected layer to obtain the output. For convenience we set the number of outputs to match the dictionary size `len(vocab)`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "39"
}
},
"outputs": [],
"source": [
"# This class has been saved in the d2l package for future use\n",
"class RNNModel(nn.Block):\n",
" def __init__(self, rnn_layer, vocab_size, **kwargs):\n",
" super(RNNModel, self).__init__(**kwargs)\n",
" self.rnn = rnn_layer\n",
" self.vocab_size = vocab_size\n",
" self.dense = nn.Dense(vocab_size)\n",
"\n",
" def forward(self, inputs, state):\n",
" # Get the one-hot vector representation by transposing the input to\n",
" # (num_steps, batch_size)\n",
" X = nd.one_hot(inputs.T, self.vocab_size)\n",
" Y, state = self.rnn(X, state)\n",
" # The fully connected layer will first change the shape of Y to\n",
" # (num_steps * batch_size, num_hiddens)\n",
" # Its output shape is (num_steps * batch_size, vocab_size)\n",
" output = self.dense(Y.reshape((-1, Y.shape[-1])))\n",
" return output, state\n",
"\n",
" def begin_state(self, *args, **kwargs):\n",
" return self.rnn.begin_state(*args, **kwargs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model Training\n",
"\n",
"As before we need a prediction function. The implementation here differs from the previous one in the function interfaces for forward computation and hidden state initialization. The main difference is that the decoding into characters is now clearly separated from the hidden variable model."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "41"
}
},
"outputs": [],
"source": [
"# This function is saved in the d2l package for future use\n",
"def predict_rnn_gluon(prefix, num_chars, model, vocab, ctx):\n",
" # Use the model's member function to initialize the hidden state.\n",
" state = model.begin_state(batch_size=1, ctx=ctx)\n",
" output = [vocab[prefix[0]]]\n",
" for t in range(num_chars + len(prefix) - 1):\n",
" X = nd.array([output[-1]], ctx=ctx).reshape((1, 1))\n",
" # Forward computation does not require incoming model parameters\n",
" (Y, state) = model(X, state)\n",
" if t < len(prefix) - 1:\n",
" output.append(vocab[prefix[t + 1]])\n",
" else:\n",
" output.append(int(Y.argmax(axis=1).asscalar()))\n",
" return ''.join([vocab.idx_to_token[i] for i in output])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's make a prediction with the a model that has random weights."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "42"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'travelleroqqoci.i_!'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ctx = d2l.try_gpu()\n",
"model = RNNModel(rnn_layer, len(vocab))\n",
"model.initialize(force_reinit=True, ctx=ctx)\n",
"predict_rnn_gluon('traveller', 10, model, vocab, ctx)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As is quite obvious, this model doesn't work at all (just yet). Next, we implement the training function. We first implement a wrap function to clip the gradients of a Gluon model."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# This function is saved in the d2l package for future use\n",
"def grad_clipping_gluon(model, theta, ctx):\n",
" params = [p.data() for p in model.collect_params().values()]\n",
" d2l.grad_clipping(params, theta, ctx)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Its training algorithm is the same as in the previous section. But we only use the sequential partitioning below for simplicity."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "18"
}
},
"outputs": [],
"source": [
"# This function is saved in the d2l package for future use\n",
"def train_and_predict_rnn_gluon(model, num_hiddens, corpus_indices, vocab, \n",
" ctx, num_epochs, num_steps, lr, \n",
" clipping_theta, batch_size, prefixes):\n",
" loss = gloss.SoftmaxCrossEntropyLoss()\n",
" model.initialize(ctx=ctx, force_reinit=True, init=init.Normal(0.01))\n",
" trainer = gluon.Trainer(model.collect_params(), 'sgd',\n",
" {'learning_rate': lr, 'momentum': 0, 'wd': 0})\n",
" start = time.time()\n",
" for epoch in range(num_epochs):\n",
" l_sum, n = 0.0, 0\n",
" data_iter = d2l.data_iter_consecutive(\n",
" corpus_indices, batch_size, num_steps, ctx)\n",
" state = model.begin_state(batch_size=batch_size, ctx=ctx)\n",
" for X, Y in data_iter:\n",
" for s in state:\n",
" s.detach()\n",
" with autograd.record():\n",
" (output, state) = model(X, state)\n",
" y = Y.T.reshape((-1,))\n",
" l = loss(output, y).mean()\n",
" l.backward()\n",
" # Clip the gradient\n",
" grad_clipping_gluon(model, clipping_theta, ctx)\n",
" # Since the error has already taken the mean, the gradient does\n",
" # not need to be averaged\n",
" trainer.step(1)\n",
" l_sum += l.asscalar() * y.size\n",
" n += y.size\n",
"\n",
" if (epoch + 1) % 50 == 0:\n",
" print('epoch %d, perplexity %f, time %.2f sec' % (\n",
" epoch + 1, math.exp(l_sum / n), time.time() - start))\n",
" start = time.time()\n",
" if (epoch + 1) % 100 == 0:\n",
" for prefix in prefixes:\n",
" print(' -', predict_rnn_gluon(prefix, 50, model, vocab, ctx))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's train the model using the same hyper-parameters as in the previous section. The primary difference is that we are now using built-in functions that are considerably faster than when writing code explicitly in Python."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "19"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 50, perplexity 8.864252, time 3.18 sec\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 100, perplexity 4.676866, time 3.12 sec\n",
" - traveller the time travel a ceat is the three dimension a m\n",
" - time traveller the time travel a ceat is the three dimension a m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 150, perplexity 2.521372, time 3.37 sec\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 200, perplexity 1.747235, time 3.46 sec\n",
" - traveller than a soall cand that uleaged his not wasted sim\n",
" - time traveller. all the psychologist looked and sume for any of \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 250, perplexity 1.460230, time 3.65 sec\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 300, perplexity 1.329449, time 3.45 sec\n",
" - traveller smiled roald at recollatly travel become absuciou\n",
" - time traveller smiled roa dometting different? and has a cabuete\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 350, perplexity 1.246987, time 3.50 sec\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 400, perplexity 1.225052, time 3.27 sec\n",
" - travellerist. 'lbtr, and men always heve of the thene said \n",
" - time traveller came at in one or twe the will extlained.' 's mov\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 450, perplexity 1.179656, time 3.36 sec\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 500, perplexity 1.176273, time 3.35 sec\n",
" - traveller smiled. 'are you sure for any his preface the tim\n",
" - time traveller smiled round at uur_doun, and the inequalion of t\n"
]
}
],
"source": [
"num_epochs, batch_size, lr, clipping_theta = 500, 32, 1, 1\n",
"pred_period, pred_len, prefixes = 50, 50, ['traveller', 'time traveller']\n",
"train_and_predict_rnn_gluon(model, num_hiddens, corpus_indices, vocab, ctx,\n",
" num_epochs, num_steps, lr, clipping_theta,\n",
" batch_size, prefixes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model achieves comparable perplexity, albeit within a shorter period of time, due to the code being more optimized. \n",
"\n",
"## Summary\n",
"\n",
"* Gluon's `rnn` module provides an implementation at the recurrent neural network layer.\n",
"* Gluon's `nn.RNN` instance returns the output and hidden state after forward computation. This forward computation does not involve output layer computation.\n",
"* As before, the compute graph needs to be detached from previous steps for reasons of efficiency.\n",
"\n",
"## Exercises\n",
"\n",
"1. Compare the implementation with the previous section. \n",
" * Why does Gluon's implementation run faster? \n",
" * If you observe a significant difference beyond speed, try to find the reason.\n",
"1. Can you make the model overfit?\n",
" * Increase the number of hidden units.\n",
" * Increase the number of iterations.\n",
" * What happens if you adjust the clipping parameter? \n",
"1. Implement the autoregressive model of the introduction to the current chapter using an RNN. \n",
"1. Modify the `predict_rnn_gluon` such as to use sampling rather than picking the most likely next character. \n",
" * What happens?\n",
" * Bias the model towards more likely outputs, e.g. by sampling from $q(w_t|w_{t-1}, \\ldots w_1) \\propto p^\\alpha(w_t|w_{t-1}, \\ldots w_1)$ for $\\alpha > 1$.\n",
"1. What happens if you increase the number of hidden layers in the RNN model? Can you make the model work?\n",
"1. How well can you compress the text using this model?\n",
" * How many bits do you need?\n",
" * Why doesn't everyone use this model for text compression? Hint - what about the compressor itself?\n",
"\n",
"## Scan the QR Code to [Discuss](https://discuss.mxnet.io/t/2365)\n",
"\n",
"![](../img/qr_rnn-gluon.svg)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}