Personalized Ranking for Recommender Systems
============================================

In the former sections, only explicit feedback was considered and models
were trained and tested on observed ratings. There are two demerits of
such methods: First, most feedback is not explicit but implicit in
real-world scenarios, and explicit feedback can be more expensive to
collect. Second, non-observed user-item pairs which may be predictive
for users’ interests are totally ignored, making these methods
unsuitable for cases where ratings are not missing at random but because
of users’ preferences. Non-observed user-item pairs are a mixture of
real negative feedback (users are not interested in the items) and
missing values (the user might interact with the items in the future).
We simply ignore the non-observed pairs in matrix factorization and
AutoRec. Clearly, these models are incapable of distinguishing between
observed and non-observed pairs and are usually not suitable for
personalized ranking tasks.

To this end, a class of recommendation models targeting at generating
ranked recommendation lists from implicit feedback have gained
popularity. In general, personalized ranking models can be optimized
with pointwise, pairwise or listwise approaches. Pointwise approaches
considers a single interaction at a time and train a classifier or a
regressor to predict individual preferences. Matrix factorization and
AutoRec are optimized with pointwise objectives. Pairwise approaches
consider a pair of items for each user and aim to approximate the
optimal ordering for that pair. Usually, pairwise approaches are more
suitable for the ranking task because predicting relative order is
reminiscent to the nature of ranking. Listwise approaches approximate
the ordering of the entire list of items, for example, direct optimizing
the ranking measures such as Normalized Discounted Cumulative Gain
(`NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`__).
However, listwise approaches are more complex and compute-intensive than
pointwise or pairwise approaches. In this section, we will introduce two
pairwise objectives/losses, Bayesian Personalized Ranking loss and Hinge
loss, and their respective implementations.

Bayesian Personalized Ranking Loss and its Implementation
---------------------------------------------------------

Bayesian personalized ranking (BPR)
:cite:`Rendle.Freudenthaler.Gantner.ea.2009` is a pairwise
personalized ranking loss that is derived from the maximum posterior
estimator. It has been widely used in many existing recommendation
models. The training data of BPR consists of both positive and negative
pairs (missing values). It assumes that the user prefers the positive
item over all other non-observed items.

In formal, the training data is constructed by tuples in the form of
:math:`(u, i, j)`, which represents that the user :math:`u` prefers the
item :math:`i` over the item :math:`j`. The Bayesian formulation of BPR
which aims to maximize the posterior probability is given below:

.. math::


   p(\Theta \mid >_u )  \propto  p(>_u \mid \Theta) p(\Theta)

Where :math:`\Theta` represents the parameters of an arbitrary
recommendation model, :math:`>_u` represents the desired personalized
total ranking of all items for user :math:`u`. We can formulate the
maximum posterior estimator to derive the generic optimization criterion
for the personalized ranking task.

.. math::


   \begin{aligned}
   \textrm{BPR-OPT} : &= \ln p(\Theta \mid >_u) \\
            & \propto \ln p(>_u \mid \Theta) p(\Theta) \\
            &= \ln \prod_{(u, i, j \in D)} \sigma(\hat{y}_{ui} - \hat{y}_{uj}) p(\Theta) \\
            &= \sum_{(u, i, j \in D)} \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \ln p(\Theta) \\
            &= \sum_{(u, i, j \in D)} \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) - \lambda_\Theta \|\Theta \|^2
   \end{aligned}

where
:math:`D \stackrel{\textrm{def}}{=} \{(u, i, j) \mid i \in I^+_u \wedge j \in I \backslash I^+_u \}`
is the training set, with :math:`I^+_u` denoting the items the user
:math:`u` liked, :math:`I` denoting all items, and
:math:`I \backslash I^+_u` indicating all other items excluding items
the user liked. :math:`\hat{y}_{ui}` and :math:`\hat{y}_{uj}` are the
predicted scores of the user :math:`u` to item :math:`i` and :math:`j`,
respectively. The prior :math:`p(\Theta)` is a normal distribution with
zero mean and variance-covariance matrix :math:`\Sigma_\Theta`. Here, we
let :math:`\Sigma_\Theta = \lambda_\Theta I`.

|Illustration of Bayesian Personalized Ranking| We will implement the
base class ``mxnet.gluon.loss.Loss`` and override the ``forward`` method
to construct the Bayesian personalized ranking loss. We begin by
importing the Loss class and the np module.

.. |Illustration of Bayesian Personalized Ranking| image:: ../img/rec-ranking.svg

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    from mxnet import gluon, np, npx
    
    npx.set_np()

The implementation of BPR loss is as follows.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    class BPRLoss(gluon.loss.Loss):
        def __init__(self, weight=None, batch_axis=0, **kwargs):
            super(BPRLoss, self).__init__(weight=None, batch_axis=0, **kwargs)
    
        def forward(self, positive, negative):
            distances = positive - negative
            loss = - np.sum(np.log(npx.sigmoid(distances)), 0, keepdims=True)
            return loss

Hinge Loss and its Implementation
---------------------------------

The Hinge loss for ranking has different form to the `hinge
loss <https://mxnet.incubator.apache.org/api/python/gluon/loss.html#mxnet.gluon.loss.HingeLoss>`__
provided within the gluon library that is often used in classifiers such
as SVMs. The loss used for ranking in recommender systems has the
following form.

.. math::


    \sum_{(u, i, j \in D)} \max( m - \hat{y}_{ui} + \hat{y}_{uj}, 0)

where :math:`m` is the safety margin size. It aims to push negative
items away from positive items. Similar to BPR, it aims to optimize for
relevant distance between positive and negative samples instead of
absolute outputs, making it well suited to recommender systems.

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    #@save
    class HingeLossbRec(gluon.loss.Loss):
        def __init__(self, weight=None, batch_axis=0, **kwargs):
            super(HingeLossbRec, self).__init__(weight=None, batch_axis=0,
                                                **kwargs)
    
        def forward(self, positive, negative, margin=1):
            distances = positive - negative
            loss = np.sum(np.maximum(- distances + margin, 0))
            return loss

These two losses are interchangeable for personalized ranking in
recommendation.

Summary
-------

-  There are three types of ranking losses available for the
   personalized ranking task in recommender systems, namely, pointwise,
   pairwise and listwise methods.
-  The two pairwise loses, Bayesian personalized ranking loss and hinge
   loss, can be used interchangeably.

Exercises
---------

-  Are there any variants of BPR and hinge loss available?
-  Can you find any recommendation models that use BPR or hinge loss?

`Discussions <https://discuss.d2l.ai/t/402>`__