Neural Style Transfer ===================== If you are a photography enthusiast, you may be familiar with the filter. It can change the color style of photos so that landscape photos become sharper or portrait photos have whitened skins. However, one filter usually only changes one aspect of the photo. To apply an ideal style to a photo, you probably need to try many different filter combinations. This process is as complex as tuning the hyperparameters of a model. In this section, we will leverage layerwise representations of a CNN to automatically apply the style of one image to another image, i.e., *style transfer* :cite:`Gatys.Ecker.Bethge.2016`. This task needs two input images: one is the *content image* and the other is the *style image*. We will use neural networks to modify the content image to make it close to the style image in style. For example, the content image in :numref:`fig_style_transfer` is a landscape photo taken by us in Mount Rainier National Park in the suburbs of Seattle, while the style image is an oil painting with the theme of autumn oak trees. In the output synthesized image, the oil brush strokes of the style image are applied, leading to more vivid colors, while preserving the main shape of the objects in the content image. .. _fig_style_transfer: .. figure:: ../img/style-transfer.svg Given content and style images, style transfer outputs a synthesized image. Method ------ :numref:`fig_style_transfer_model` illustrates the CNN-based style transfer method with a simplified example. First, we initialize the synthesized image, for example, into the content image. This synthesized image is the only variable that needs to be updated during the style transfer process, i.e., the model parameters to be updated during training. Then we choose a pretrained CNN to extract image features and freeze its model parameters during training. This deep CNN uses multiple layers to extract hierarchical features for images. We can choose the output of some of these layers as content features or style features. Take :numref:`fig_style_transfer_model` as an example. The pretrained neural network here has 3 convolutional layers, where the second layer outputs the content features, and the first and third layers output the style features. .. _fig_style_transfer_model: .. figure:: ../img/neural-style.svg CNN-based style transfer process. Solid lines show the direction of forward propagation and dotted lines show backward propagation. Next, we calculate the loss function of style transfer through forward propagation (direction of solid arrows), and update the model parameters (the synthesized image for output) through backpropagation (direction of dashed arrows). The loss function commonly used in style transfer consists of three parts: (i) *content loss* makes the synthesized image and the content image close in content features; (ii) *style loss* makes the synthesized image and style image close in style features; and (iii) *total variation loss* helps to reduce the noise in the synthesized image. Finally, when the model training is over, we output the model parameters of the style transfer to generate the final synthesized image. In the following, we will explain the technical details of style transfer via a concrete experiment. Reading the Content and Style Images ------------------------------------ First, we read the content and style images. From their printed coordinate axes, we can tell that these images have different sizes. .. raw:: html

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python %matplotlib inline import torch import torchvision from torch import nn from d2l import torch as d2l d2l.set_figsize() content_img = d2l.Image.open('../img/rainier.jpg') d2l.plt.imshow(content_img); .. figure:: output_neural-style_5de8ca_3_0.svg .. raw:: latex \diilbookstyleinputcell .. code:: python style_img = d2l.Image.open('../img/autumn-oak.jpg') d2l.plt.imshow(style_img); .. figure:: output_neural-style_5de8ca_4_0.svg .. raw:: html

.. raw:: html

.. raw:: latex \diilbookstyleinputcell .. code:: python %matplotlib inline from mxnet import autograd, gluon, image, init, np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np() d2l.set_figsize() content_img = image.imread('../img/rainier.jpg') d2l.plt.imshow(content_img.asnumpy()); .. raw:: latex \diilbookstyleoutputcell .. parsed-literal:: :class: output [22:41:40] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU .. figure:: output_neural-style_5de8ca_7_1.svg .. raw:: latex \diilbookstyleinputcell .. code:: python style_img = image.imread('../img/autumn-oak.jpg') d2l.plt.imshow(style_img.asnumpy()); .. figure:: output_neural-style_5de8ca_8_0.svg .. raw:: html

.. raw:: html

Preprocessing and Postprocessing -------------------------------- Below, we define two functions for preprocessing and postprocessing images. The ``preprocess`` function standardizes each of the three RGB channels of the input image and transforms the results into the CNN input format. The ``postprocess`` function restores the pixel values in the output image to their original values before standardization. Since the image printing function requires that each pixel has a floating point value from 0 to 1, we replace any value smaller than 0 or greater than 1 with 0 or 1, respectively. .. raw:: html