# 2.3. Scalars, Vectors, Matrices, and Tensors¶

Now that you can store and manipulate data, let us briefly review the subset of basic linear algebra that you will need to understand and implement most of models covered in this book. Below, we introduce the basic mathematical objects in linear algebra, expressing each both through mathematical notation and the corresponding implementation in code.

## 2.3.1. Scalars¶

If you never studied linear algebra or machine learning, then your past
experience with math probably consisted of thinking about one number at
a time. And, if you ever balanced a checkbook or even paid for dinner at
a restaurant then you already know how to do basic things like adding
and multiplying pairs of numbers. For example, the temperature in Palo
Alto is \(52\) degrees Fahrenheit. Formally, we call values
consisting of just one numerical quantity *scalars*. If you wanted to
convert this value to Celsius (the metric system’s more sensible
temperature scale), you would evaluate the expression
\(c = \frac{5}{9}(f - 32)\), setting \(f\) to \(52\). In
this equation, each of the terms—\(5\), \(9\), and
\(32\)—are scalar values. The placeholders \(c\) and \(f\)
are called *variables* and they represented unknown scalar values.

In this book, we adopt the mathematical notation where scalar variables
are denoted by ordinary lower-cased letters (e.g., \(x\), \(y\),
and \(z\)). We denote the space of all (continuous) *real-valued*
scalars by \(\mathbb{R}\). For expedience, we will punt on rigorous
definitions of what precisely *space* is, but just remember for now that
the expression \(x \in \mathbb{R}\) is a formal way to say that
\(x\) is a real-valued scalar. The symbol \(\in\) can be
pronounced “in” and simply denotes membership in a set. Analogously, we
could write \(x,y \in \{0,1\}\) to state that \(x\) and
\(y\) are numbers whose value can only be \(0\) or \(1\).

In MXNet code, a scalar is represented by an `ndarray`

with just one
element. In the next snippet, we instantiate two scalars and perform
some familiar arithmetic operations with them, namely addition,
multiplication, division, and exponentiation.

```
from mxnet import np, npx
npx.set_np()
x = np.array(3.0)
y = np.array(2.0)
x + y, x * y, x / y, x ** y
```

```
(array(5.), array(6.), array(1.5), array(9.))
```

## 2.3.2. Vectors¶

You can think of a vector as simply a list of scalar values. We call
these values the *elements* (*entries* or *components*) of the vector.
When our vectors represent examples from our dataset, their values hold
some real-world significance. For example, if we were training a model
to predict the risk that a loan defaults, we might associate each
applicant with a vector whose components correspond to their income,
length of employment, number of previous defaults, and other factors. If
we were studying the risk of heart attacks hospital patients potentially
face, we might represent each patient by a vector whose components
capture their most recent vital signs, cholesterol levels, minutes of
exercise per day, etc. In math notation, we will usually denote vectors
as bold-faced, lower-cased letters (e.g., \(\mathbf{x}\),
\(\mathbf{y}\), and \(\mathbf{z})\).

In MXNet, we work with vectors via \(1\)-dimensional `ndarray`

s.
In general `ndarray`

s can have arbitrary lengths, subject to the
memory limits of your machine.

```
x = np.arange(4)
x
```

```
array([0., 1., 2., 3.])
```

We can refer to any element of a vector by using a subscript. For example, we can refer to the \(i^\mathrm{th}\) element of \(\mathbf{x}\) by \(x_i\). Note that the element \(x_i\) is a scalar, so we do not bold-face the font when referring to it. Extensive literature considers column vectors to be the default orientation of vectors, so does this book. In math, a vector \(\mathbf{x}\) can be written as

where \(x_1, \ldots, x_n\) are elements of the vector. In code, we
access any element by indexing into the `ndarray`

.

```
x[3]
```

```
array(3.)
```

### 2.3.2.1. Length, Dimensionality, and Shape¶

Let us revisit some concepts from Section 2.1. A vector is
just an array of numbers. And just as every array has a length, so does
every vector. In math notation, if we want to say that a vector
\(\mathbf{x}\) consists of \(n\) real-valued scalars, we can
express this as \(\mathbf{x} \in \mathbb{R}^n\). The length of a
vector is commonly called the *dimension* of the vector.

As with an ordinary Python array, we can access the length of an
`ndarray`

by calling Python’s built-in `len()`

function.

```
len(x)
```

```
4
```

When an `ndarray`

represents a vector (with precisely one axis), we
can also access its length via the `.shape`

attribute. The shape is a
tuple that lists the length (dimensionality) along each axis of the
`ndarray`

. For `ndarray`

s with just one axis, the shape has just
one element.

```
x.shape
```

```
(4,)
```

Note that the word “dimension” tends to get overloaded in these contexts
and this tends to confuse people. To clarify, we use the dimensionality
of a *vector* or an *axis* to refer to its length, i.e., the number of
elements of a vector or an axis. However, we use the dimensionality of
an `ndarray`

to refer to the number of axes that an `ndarray`

has.
In this sense, the dimensionality of an `ndarray`

’s some axis will
be the length of that axis.

## 2.3.3. Matrices¶

Just as vectors generalize scalars from order \(0\) to order
\(1\), matrices generalize vectors from order \(1\) to order
\(2\). Matrices, which we will typically denote with bold-faced,
capital letters (e.g., \(\mathbf{X}\), \(\mathbf{Y}\), and
\(\mathbf{Z}\)), are represented in code as `ndarray`

s with
\(2\) axes.

In math notation, we use \(\mathbf{A} \in \mathbb{R}^{m \times n}\) to express that the matrix \(\mathbf{A}\) consists of \(m\) rows and \(n\) columns of real-valued scalars. Visually, we can illustrate any matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) as a table, where each element \(a_{ij}\) belongs to the \(i^{\mathrm{th}}\) row and \(j^{\mathrm{th}}\) column:

For any \(\mathbf{A} \in \mathbb{R}^{m \times n}\), the shape of
\(\mathbf{A}\) is (\(m\), \(n\)) or \(m \times n\).
Specifically, when a matrix has the same number of rows and columns, its
shape becomes a square; thus, it is called a *square matrix*.

We can create an \(m \times n\) matrix in MXNet by specifying a
shape with two components \(m\) and \(n\) when calling any of
our favorite functions for instantiating an `ndarray`

.

```
A = np.arange(20).reshape(5, 4)
A
```

```
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.],
[16., 17., 18., 19.]])
```

We can access the scalar element \(a_{ij}\) of a matrix \(\mathbf{A}\) in (2.3.2) by specifying the indices for the row (\(i\)) and column (\(j\)), such as \([\mathbf{A}]_{ij}\). When the scalar elements of a matrix \(\mathbf{A}\), such as in (2.3.2), are not given, we may simply use the lower-case letter of the matrix \(\mathbf{A}\) with the index subscript, \(a_{ij}\), to refer to \([\mathbf{A}]_{ij}\). To keep notation simple, commas are inserted to separate indices only when necessary, such as \(a_{2,3j}\) and \([\mathbf{A}]_{2i-1,3}\).

Sometimes, we want to flip the axes. When we exchange a matrix’s rows
and columns, the result is called the *transpose* of the matrix.
Formally, we signify a matrix \(\mathbf{A}\)’s transpose by
\(\mathbf{A}^\top\) and if \(\mathbf{B} = \mathbf{A}^\top\),
then \(b_{ij} = a_{ji}\) for any \(i\) and \(j\). Thus, the
transpose of \(\mathbf{A}\) in (2.3.2) is a
\(n \times m\) matrix:

In code, we access a matrix’s transpose via the `T`

attribute.

```
A.T
```

```
array([[ 0., 4., 8., 12., 16.],
[ 1., 5., 9., 13., 17.],
[ 2., 6., 10., 14., 18.],
[ 3., 7., 11., 15., 19.]])
```

As a special type of the square matrix, a *symmetric matrix*
\(\mathbf{A}\) is equal to its transpose:
\(\mathbf{A} = \mathbf{A}^\top\).

```
B = np.array([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B
```

```
array([[1., 2., 3.],
[2., 0., 4.],
[3., 4., 5.]])
```

```
B == B.T
```

```
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
```

Matrices are useful data structures: they allow us to organize data that
have different modalities of variation. For example, rows in our matrix
might correspond to different houses (data points), while columns might
correspond to different attributes. This should sound familiar if you
have ever used spreadsheet software or have read Section 2.2.
Thus, although the default orientation of a single vector is a column
vector, in a matrix that represents a tabular dataset, it is more
conventional to treat each data point as a row vector in the matrix.
And, as we will see in later chapters, this convention will enable
common deep learning practices. For example, along the outermost axis of
an `ndarray`

, we can access or enumerate minibatches of data points,
or just data points if no minibatch exists.

## 2.3.4. Tensors¶

Just as vectors generalize scalars, and matrices generalize vectors, we
can build data structures with even more axes. Tensors give us a generic
way of describing `ndarray`

s with an arbitrary number of axes.
Vectors, for example, are first-order tensors, and matrices are
second-order tensors. Tensors are denoted with capital letters of a
special font face (e.g., \(\mathsf{X}\), \(\mathsf{Y}\), and
\(\mathsf{Z}\)) and their indexing mechanism (e.g., \(x_{ijk}\)
and \([\mathsf{X}]_{1, 2i-1,3}\)) is similar to that of matrices.

Tensors will become more important when we start working with images,
which arrive as `ndarray`

s with 3 axes corresponding to the height,
width, and a *channel* axis for stacking the color channels (red, green,
and blue). For now, we will skip over higher order tensors and focus on
the basics.

```
X = np.arange(24).reshape(2, 3, 4)
X
```

```
array([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]],
[[12., 13., 14., 15.],
[16., 17., 18., 19.],
[20., 21., 22., 23.]]])
```

## 2.3.5. Summary¶

Scalars, vectors, matrices, and tensors are basic mathematical objects in linear algebra.

Vectors generalize scalars, and matrices generalize vectors.

In the

`ndarray`

representation, scalars, vectors, matrices, and tensors have 0, 1, 2, and an arbitrary number of axes, respectively.

## 2.3.6. Exercises¶

Prove that the transpose of a matrix \(\mathbf{A}\)’s transpose is \(\mathbf{A}\): \((\mathbf{A}^\top)^\top = \mathbf{A}\).

Given two matrices \(\mathbf{A}\) and \(\mathbf{B}\), show that the sum of transposes is equal to the transpose of a sum: \(\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top\).

Given any square matrix \(\mathbf{A}\), is \(\mathbf{A} + \mathbf{A}^\top\) always symmetric? Why?

We defined the tensor

`X`

of shape (\(2\), \(3\), \(4\)) in this section. What is the output of`len(X)`

?For a tensor

`X`

of arbitrary shape, does`len(X)`

always correspond to the length of a certain axis of`X`

? What is that axis?