{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convolutions for Images\n",
"\n",
"Now that we understand how convolutional layers work in theory, \n",
"we are ready to see how this works in practice. \n",
"Since we have motivated convolutional neural networks\n",
"by their applicability to image data,\n",
"we will stick with image data in our examples,\n",
"and begin by revisiting the convolutional layer\n",
"that we introduced in the previous section. \n",
"We note that strictly speaking, *convolutional* layers are a slight misnomer,\n",
"since the operations are typically expressed as cross correlations.\n",
"\n",
"\n",
"## The Cross-Correlation Operator\n",
"\n",
"In a convolutional layer, an input array \n",
"and a correlation kernel array are combined\n",
"to produce an output array through a cross-correlation operation. \n",
"Let's see how this works for two dimensions. \n",
"In our example, the input is a two-dimensional array \n",
"with a height of 3 and width of 3. \n",
"We mark the shape of the array as $3 \\times 3$ or (3, 3). \n",
"The height and width of the kernel array are both 2. \n",
"Common names for this array in the deep learning research community\n",
"include *kernel* and *filter*.\n",
"The shape of the kernel window (also known as the convolution window) \n",
"is given precisely by the height and width of the kernel \n",
"(here it is $2 \\times 2$).\n",
"\n",
"![Two-dimensional cross-correlation operation. The shaded portions are the first output element and the input and kernel array elements used in its computation: $0\\times0+1\\times1+3\\times2+4\\times3=19$. ](../img/correlation.svg)\n",
"\n",
"In the two-dimensional cross-correlation operation, \n",
"we begin with the convolution window positioned \n",
"at the top-left corner of the input array\n",
"and slide it across the input array, \n",
"both from left to right and top to bottom. \n",
"When the convolution window slides to a certain position, \n",
"the input subarray contained in that window \n",
"and the kernel array are multiplied (element-wise) \n",
"and the resulting array is summed up\n",
"yielding a single scalar value.\n",
"This result if precisely the value of the output array\n",
"at the corresponding location.\n",
"Here, the output array has a height of 2 and width of 2\n",
"and the four elements are derived from \n",
"the two-dimensional cross-correlation operation:\n",
"\n",
"$$\n",
"0\\times0+1\\times1+3\\times2+4\\times3=19,\\\\\n",
"1\\times0+2\\times1+4\\times2+5\\times3=25,\\\\\n",
"3\\times0+4\\times1+6\\times2+7\\times3=37,\\\\\n",
"4\\times0+5\\times1+7\\times2+8\\times3=43.\n",
"$$\n",
"\n",
"Note that along each axi, the output is slightly *smaller* than the input. \n",
"Because the kernel has a width greater than one,\n",
"and we can only computer the cross-correlation\n",
"for locations where the kernel fits wholly within the image, \n",
"the output size is given by the input size $H \\times W$ \n",
"minus the size of the convolutional kernel $h \\times w$ \n",
"via $(H-h+1) \\times (W-w+1)$. \n",
"This is the case since we need enough space \n",
"to 'shift' the convolutional kernel across the image \n",
"(later we will see how to keep the size unchanged \n",
"by padding the image with zeros around its boundary \n",
"such that there's enough space to shift the kernel). \n",
"Next, we implement the above process in the `corr2d` function. \n",
"It accepts the input array `X` with the kernel array `K` \n",
"and outputs the array `Y`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from mxnet import autograd, nd\n",
"from mxnet.gluon import nn\n",
"\n",
"# This function has been saved in the d2l package for future use\n",
"def corr2d(X, K):\n",
" h, w = K.shape\n",
" Y = nd.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))\n",
" for i in range(Y.shape[0]):\n",
" for j in range(Y.shape[1]):\n",
" Y[i, j] = (X[i: i + h, j: j + w] * K).sum()\n",
" return Y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can construct the input array `X` and the kernel array `K` \n",
"from the figure above \n",
"to validate the output of the above implementations \n",
"of the two-dimensional cross-correlation operation."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"[[19. 25.]\n",
" [37. 43.]]\n",
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = nd.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])\n",
"K = nd.array([[0, 1], [2, 3]])\n",
"corr2d(X, K)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolutional Layers\n",
"\n",
"A convolutional layer cross-correlates the input and kernels \n",
"and adds a scalar bias to produce an output. \n",
"The parameters of the convolutional layer\n",
"are precisely the values that constitute the kernel and the scalar bias. \n",
"When training the models based on convolutional layers,\n",
"we typically initialize the kernels randomly, \n",
"just as we would with a fully-connected layer.\n",
"\n",
"We are now ready to implement a two-dimensional convolutional layer \n",
"based on the `corr2d` function defined above. \n",
"In the `__init__` constructor function, \n",
"we declare `weight` and `bias` as the two model parameters. \n",
"The forward computation function `forward` \n",
"calls the `corr2d` function and adds the bias. \n",
"As with $h \\times w$ cross-correlation \n",
"we also refer to convolutional layers \n",
"as $h \\times w$ convolutions."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "70"
}
},
"outputs": [],
"source": [
"class Conv2D(nn.Block):\n",
" def __init__(self, kernel_size, **kwargs):\n",
" super(Conv2D, self).__init__(**kwargs)\n",
" self.weight = self.params.get('weight', shape=kernel_size)\n",
" self.bias = self.params.get('bias', shape=(1,))\n",
"\n",
" def forward(self, x):\n",
" return corr2d(x, self.weight.data()) + self.bias.data()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Object Edge Detection in Images\n",
"\n",
"Let's look at a simple application of a convolutional layer: \n",
"detecting the edge of an object in an image \n",
"by finding the location of the pixel change. \n",
"First, we construct an 'image' of $6\\times 8$ pixels. \n",
"The middle four columns are black (0) and the rest are white (1)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "66"
}
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"[[1. 1. 0. 0. 0. 0. 1. 1.]\n",
" [1. 1. 0. 0. 0. 0. 1. 1.]\n",
" [1. 1. 0. 0. 0. 0. 1. 1.]\n",
" [1. 1. 0. 0. 0. 0. 1. 1.]\n",
" [1. 1. 0. 0. 0. 0. 1. 1.]\n",
" [1. 1. 0. 0. 0. 0. 1. 1.]]\n",
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = nd.ones((6, 8))\n",
"X[:, 2:6] = 0\n",
"X"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we construct a kernel `K` with a height of 1 and width of 2. \n",
"When we perform the cross-correlation operation with the input, \n",
"if the horizontally adjacent elements are the same, \n",
"the output is 0. Otherwise, the output is non-zero."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "67"
}
},
"outputs": [],
"source": [
"K = nd.array([[1, -1]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Enter `X` and our designed kernel `K` \n",
"to perform the cross-correlation operations. \n",
"As you can see, we will detect 1 for the edge from white to black \n",
"and -1 for the edge from black to white. \n",
"The rest of the outputs are 0."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "69"
}
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"[[ 0. 1. 0. 0. 0. -1. 0.]\n",
" [ 0. 1. 0. 0. 0. -1. 0.]\n",
" [ 0. 1. 0. 0. 0. -1. 0.]\n",
" [ 0. 1. 0. 0. 0. -1. 0.]\n",
" [ 0. 1. 0. 0. 0. -1. 0.]\n",
" [ 0. 1. 0. 0. 0. -1. 0.]]\n",
""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Y = corr2d(X, K)\n",
"Y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's apply the kernel to the transposed image. \n",
"As expected, it vanishes. The kernel `K` only detects vertical edges."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"[[0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]\n",
" [0. 0. 0. 0. 0.]]\n",
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corr2d(X.T, K)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning a Kernel\n",
"\n",
"Designing an edge detector by finite differences `[1, -1]` is neat \n",
"if we know this is precisely what we are looking for. \n",
"However, as we look at larger kernels, \n",
"and consider successive layers of convolutions, \n",
"it might be impossible to specify \n",
"precisely what each filter should be doing manually. \n",
"\n",
"Now let's see whether we can learn the kernel that generated `Y` from `X` \n",
"by looking at the (input, output) pairs only. \n",
"We first construct a convolutional layer \n",
"and initialize its kernel as a random array. \n",
"Next, in each iteration, we will use the squared error \n",
"to compare `Y` and the output of the convolutional layer, \n",
"then calculate the gradient to update the weight. \n",
"For the sake of simplicity, in this convolutional layer,\n",
"we will ignores the bias.\n",
"\n",
"We previously constructed the `Conv2D` class. \n",
"However, since we used single-element assignments, \n",
"Gluon has some trouble finding the gradient. \n",
"Instead, we use the built-in `Conv2D` class provided by Gluon below."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"n": "83"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"batch 2, loss 4.949\n",
"batch 4, loss 0.831\n",
"batch 6, loss 0.140\n",
"batch 8, loss 0.024\n",
"batch 10, loss 0.004\n"
]
}
],
"source": [
"# Construct a convolutional layer with 1 output channel \n",
"# (channels will be introduced in the following section) \n",
"# and a kernel array shape of (1, 2)\n",
"\n",
"conv2d = nn.Conv2D(1, kernel_size=(1, 2))\n",
"conv2d.initialize()\n",
"\n",
"# The two-dimensional convolutional layer uses four-dimensional input and\n",
"# output in the format of (example channel, height, width), where the batch\n",
"# size (number of examples in the batch) and the number of channels are both 1\n",
"\n",
"X = X.reshape((1, 1, 6, 8))\n",
"Y = Y.reshape((1, 1, 6, 7))\n",
"\n",
"for i in range(10):\n",
" with autograd.record():\n",
" Y_hat = conv2d(X)\n",
" l = (Y_hat - Y) ** 2\n",
" l.backward()\n",
" # For the sake of simplicity, we ignore the bias here\n",
" conv2d.weight.data()[:] -= 3e-2 * conv2d.weight.grad()\n",
" if (i + 1) % 2 == 0:\n",
" print('batch %d, loss %.3f' % (i + 1, l.sum().asscalar()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the error has dropped to a small value after 10 iterations. Now we will take a look at the kernel array we learned."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"[[ 0.9895 -0.9873705]]\n",
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conv2d.weight.data().reshape((1, 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indeed, the learned kernel array is remarkably close \n",
"to the kernel array `K` we defined earlier.\n",
"\n",
"## Cross-correlation and Convolution\n",
"\n",
"Recall the observation from the previous section \n",
"that cross-correlation and convolution are equivalent.\n",
"In the figure above it is easy to see this correspondence. \n",
"Simply flip the kernel from the bottom left to the top right. \n",
"In this case the indexing in the sum is reverted, \n",
"yet the same result can be obtained. \n",
"In keeping with standard terminology with deep learning literature,\n",
"we will continue to refer to the cross-correlation operation \n",
"as a convolution even though, strictly-speaking, it is slightly different.\n",
"\n",
"## Summary\n",
"\n",
"* The core computation of a two-dimensional convolutional layer is a two-dimensional cross-correlation operation. In its simplest form, this performs a cross-correlation operation on the two-dimensional input data and the kernel, and then adds a bias.\n",
"* We can design a kernel to detect edges in images.\n",
"* We can learn the kernel through data.\n",
"\n",
"## Exercises\n",
"\n",
"1. Construct an image `X` with diagonal edges.\n",
" * What happens if you apply the kernel `K` to it?\n",
" * What happens if you transpose `X`?\n",
" * What happens if you transpose `K`?\n",
"1. When you try to automatically find the gradient for the `Conv2D` class we created, what kind of error message do you see?\n",
"1. How do you represent a cross-correlation operation as a matrix multiplication by changing the input and kernel arrays?\n",
"1. Design some kernels manually.\n",
" * What is the form of a kernel for the second derivative?\n",
" * What is the kernel for the Laplace operator?\n",
" * What is the kernel for an integral?\n",
" * What is the minimum size of a kernel to obtain a derivative of degree $d$?\n",
"\n",
"## Scan the QR Code to [Discuss](https://discuss.mxnet.io/t/2349)\n",
"\n",
"![](../img/qr_conv-layer.svg)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}