
Functional Analysis I
Term 1, 2010–2011
Vassili Gelfreich


Contents
1
Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Examples of vector spaces . . . . . . . . . . . . . . . . . . . . . . .
2
Hamel bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
8
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Four famous inequalities . . . . . . . . . . . . . . . . . . . . . . . .
9
Examples of norms on a space of functions
. . . . . . . . . . . . . .
10
Equivalence of norms . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Linear Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
14
. . . . . . . . . . . . . . . . . . . . . . . .
14
Topology on a normed space . . . . . . . . . . . . . . . . . . . . . .
16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
20
Completeness: Definition and examples . . . . . . . . . . . . . . . .
20
The completion of a normed space . . . . . . . . . . . . . . . . . . .
22
Weierstrass Approximation Theorem . . . . . . . . . . . . . . . . . .
25
28
Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
(R) . . . . . . . . . . . . . . . . . . . . . . . . .
32
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
36
. . . . . . . . . . . . . . . . . . . . . . . . . .
36
Natural norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Parallelogram law and polarisation identity
. . . . . . . . . . . . . .
38
Hilbert spaces: Definition and examples . . . . . . . . . . . . . . . .
40
Orthonormal bases in Hilbert spaces
41
Orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
Gram-Schmidt orthonormalisation . . . . . . . . . . . . . . . . . . .
42
. . . . . . . . . . . . . . . . . . . . . . . . . . .
44
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
Orthonormal basis in a Hilbert space . . . . . . . . . . . . . . . . . .
46
Separable Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . .
47
iii

Closest points and approximations
50
Closest points in convex subsets . . . . . . . . . . . . . . . . . . . .
50
Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . .
51
. . . . . . . . . . . . . . . . . . . . . . . . . .
53
Linear maps between Banach spaces
56
Continuous linear maps . . . . . . . . . . . . . . . . . . . . . . . . .
56
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Kernel and range . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
61
. . . . . . . . . . . . . . . . . . . . . . . .
61
10.2 Riesz representation theorem . . . . . . . . . . . . . . . . . . . . . .
61
11 Linear operators on Hilbert spaces
63
11.1 Complexification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
11.2 Adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
11.3 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . .
66
12 Introduction to Spectral Theory
69
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
12.2 Invertible operators . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
12.3 Resolvent and spectrum . . . . . . . . . . . . . . . . . . . . . . . . .
71
74
13.1 Definition, properties and examples
. . . . . . . . . . . . . . . . . .
74
13.2 Spectral theory for compact self-adjoint operators . . . . . . . . . . .
75
79
iv

Preface
These notes follow the lectures on Functional Analysis given in the Autumn of 2010.
If you find a mistake or misprint please inform the author by sending an e-mail to
v.gelfreich@warwick.ac.uk. The author thanks James Robinson for his set of
notes and selection of exercises which significantly facilitated the preparation of the
lectures.
1
Vector spaces
1.1
Definition.
A vector space V over a field K is a set equipped with two binary operations called
vector addition and multiplication by scalars. Elements of V are called vectors and
elements of K are called scalars. The sum of two vectors x, y ∈ V is denoted x + y, the
product of a scalar α ∈ K and vector x ∈ V is denoted αx.
It is possible to consider vector spaces over an arbitrary field K, but we will con-
sider the fields R and C only. So we will always assume that K denotes either R or C
and refer to V as a real or complex vector space respectively.
In a vector space, addition and multiplication have to satisfy the following set of
axioms: Let x, y, z be arbitrary vectors in V , and α, β be arbitrary scalars in K, then
• Associativity of addition: x + (y + z) = (x + y) + z.
• Commutativity of addition: x + y = y + z.
• There exists an element 0 ∈ V , called the zero vector, such that x + 0 = x for all
x
∈ V .
• For all x ∈ V , there exists an element y ∈ V , called the additive inverse of x, such
that x + y = 0. The additive inverse is denoted −x.
• “Associativity” of multiplication:
α (β x) = (α β )x.
• Distributivity:
α (x + y) = α x + α y
and
(α + β )x = αx + β x.
• There is an element 1 ∈ K such that 1x = x for all x ∈ V . This element is called
the multiplicative identity
in K.
1
The purist would not use the word “associativity” for this property as it includes two different
operations: αβ is a product of two scalars and β x involves a vector and scalar.
1

It is convenient to define two additional operations: subtraction of two vectors and
division by a (non-zero) scalar are defined by
x
− y = x + (−y),
x
/α = (1/α)x.
1.2
Examples of vector spaces
1. R
n
is a real vector space.
2. C
n
is a complex vector space.
3. C
n
is a real vector space.
4. The set of all polynomials P is a vector space:
P
=
(
n
∑
k
=0
α
k
x
k
: α
k
∈ K, n ∈ N
)
.
5. The set of all bounded sequences `
∞
(K) is a vector space:
`
∞
(K) =
(x
1
, x
2
, . . .) : x
k
∈ K for all k ∈ N, sup
k
∈N
|x
k
| < ∞
.
For two sequences x, y ∈ `
∞
(K), we define x + y by
x
+ y = (x
1
+ y
1
, x
2
+ y
2
, . . .) .
For α ∈ K, we set
α x = (α x
1
, αx
2
, . . .) .
We will always use these definitions of addition and multiplication by scalars for
sequences.
In order to show that `
∞
(K) is a vector space it is necessary to check that
• the binary operations are consistently defined, i.e. to check that αx and
x
+ y ∈ `
∞
(K) for any x, y ∈ `
∞
(K) and any α ∈ K,
• the axioms of vector space are satisfied.
6. Let 1 ≤ p < ∞. The set `
p
(K) of all p
th
power summable sequences is a vector
space:
`
p
(K) =
(
(x
1
, x
2
, . . .) : x
k
∈ K,
∞
∑
k
=1
|x
k
|
p
< ∞
)
.
2

The definition of the multiplication by scalars and vector addition is the same
as in the previous example. Let us check that the sum x + y ∈ `
p
(K) for any
x
, y ∈ `
p
(K). Indeed,
∞
∑
k
=1
|x
k
+ y
k
|
p
≤
∞
∑
k
=1
(|x
k
| + |y
k
|)
p
≤
∞
∑
k
=1
2 max{ |x
k
|, |y
k
| }
p
≤
∞
∑
k
=1
2
p
|x
k
|
p
+ |y
k
|
p
= 2
p
∞
∑
k
=1
|x
k
|
p
+ 2
p
∞
∑
k
=1
|y
k
|
p
< ∞ .
7. The space C[0, 1] of all real-valued continuous functions on the closed interval
[0, 1] is a vector space. The addition and multiplication by scalars are defined
naturally: for f , g ∈ C[0, 1] and α ∈ R we denote by f + g the function whose
values are given by
( f + g)(t) = f (t) + g(t) ,
t
∈ [0, 1] ,
and α f is the function whose values are
(α f )(t) = α f (t) ,
t
∈ [0, 1] ,
We will always use similar definitions for spaces of functions to be considered
later.
8. The set ˜L
1
(0, 1) of all real-valued continuous functions f on the open interval
(0, 1) for which
Z
1
0
| f (t)| dt < ∞
is a vector space.
If f ∈ C[0, 1] then f ∈ ˜L
1
(0, 1). Indeed, since [0, 1] is compact f is bounded (and
attains its lower and upper bounds). Then
Z
1
0
| f (t)| dt ≤ max
t
∈[0,1]
| f (t)| < ∞ ,
i.e. f ∈ ˜L
1
(0, 1).
We note that ˜L
1
(0, 1) contains some functions which do not belong to C[0, 1].
For example, f (t) = t
−1/2
is not continuous on [0, 1] but it is continuous on
(0, 1) and
Z
1
0
| f (t)| dt =
Z
1
0
|t|
−1/2
dx
= 2t
1/2
1
0
= 2 < ∞ ,
so f ∈ ˜L
1
(0, 1).
We conclude that C[0, 1] is a strict subset of ˜L
1
(0, 1).
3

1.3
Hamel bases
Definition 1.1 The linear span of a subset E of a vector space V is the collection of
all finite linear combinations of elements of E:
Span(E) =
(
x
∈ V : x =
n
∑
j
=1
α
j
e
j
, n ∈ N, α
j
∈ K, e
j
∈ E
)
.
We say that E
spans V if V = Span(E), i.e. every element of V can be written as a
finite linear combination of elements of E.
Definition 1.2 A set E is linearly independent if any finite collection of elements of E
is linearly independent:
n
∑
j
=1
α
j
e
j
= 0
=⇒
α
1
= α
2
= · · · = α
n
= 0
for any choice of n
∈ N, e
j
∈ E and α
j
∈ K.
Definition 1.3 A Hamel basis E for V is a linearly independent subset of V which
spans V .
Examples:
1. Any basis in R
n
is a Hamel basis.
2. The set E =
1, x, x
2
, . . .
is a Hamel basis in the space of all polynomials.
Lemma 1.4 If E is a Hamel basis for a vector space V then any element x ∈ V can be
uniquely written in the form
x
=
n
∑
j
=1
α
j
e
j
where n
∈ N, α
j
∈ K, and e
j
∈ E.
Exercise: Prove the lemma.
Definition 1.5 We say that a set is finite if it consists of a finite number of elements.
Theorem 1.6 If V has a finite Hamel basis then every Hamel basis for V has the same
number of elements.
4

Proof.
Let E = { e
1
, . . . , e
n
} be a finite Hamel basis in V . Suppose there is a Hamel
basis E
0
= { e
0
1
, . . . , e
0
m
} which has more elements than E (if m < n, swap E and E
0
).
Since Span(E
0
) = V we can write e
1
as a linear combination of elements from E
0
:
e
1
=
m
∑
k
=1
α
k
e
0
k
.
Since e
1
6= 0 there is k
1
such that α
k
1
6= 0 so we can write
e
0
k
1
= α
−1
k
1
e
1
−
∑
1≤k≤m
k
6=k
1
α
−1
1
α
k
e
0
k
,
Let S
1
= {e
1
} and S
0
1
= {e
0
k
1
}. The set E
0
1
= (E
0
\ S
0
1
) ∪ S
1
is linearly independent and
Span(E
0
1
) = Span(E
0
) = V (check these two claims).
We can repeat the procedure inductively. Let S
j
= {e
1
, . . . , e
j
}. Suppose for some
j
, 1 ≤ j ≤ n − 1, there is a set S
0
j
= {e
0
k
1
, . . . , e
0
k
j
} such that the set
E
0
j
= (E
0
\ S
0
j
) ∪ S
j
and Span(E
0
j
) = V . Then there are α
k
, β
k
∈ K such that
e
j
+1
=
∑
e
0
k
∈E
0
\S
0
j
α
k
e
0
k
+
∑
e
k
∈S
j
β
k
e
k
.
Since S
j
+1
is linearly independent, there is k
j
+1
such that α
k
j
+1
6= 0. Let S
0
j
+1
=
S
0
j
∪ {e
k
j
}. Then E
0
j
+1
is linearly independent and spans V (by the same arguments as
in the case j = 1).
After n inductive steps we get that E
0
n
is linearly independent, which is impossible
because S
n
= E and consequently E
0
n
= (E
0
\ S
0
n
) ∪ E. This contradiction implies that
m
= n.
Definition 1.7 If V has a finite basis E then the dimension of V (denoted dimV ) is
the number of elements in E. If V has no finite basis then we say that V is
infinite-
dimensional.
Example: In R
n
any basis consists of n vectors. Therefore dim R
n
= n.
Let V and W be two vector spaces over K.
Definition 1.8 A map L : V → W is called linear if for any x, y ∈ V and any α ∈ K
L
(x + αy) = L(x) + αL(y) .
Definition 1.9 If a linear map L : V → W is a bijection, then L is called a linear
isomorphism. We say that V and W are linearly isomorphic if there is a bijective
linear map L
: V → W .
5

Proposition 1.10 Any n-dimensional vector space over K is linearly isomorphic to
K
n
.
Proof:
Let E = { e
j
: 1 ≤ j ≤ n } be a basis in V , then every element x ∈ V is represented
uniquely in the form
x
=
n
∑
j
=1
α
j
e
j
.
The map L : x 7→ (α
1
, . . . , α
n
) is a linear bijection V → K
n
. Therefore V is linearly
isomorphic to K
n
.
In order to show that a vector space is infinite-dimensional it is sufficient to find an
infinite linearly independent subset. Let’s consider the following examples:
1. `
p
(K) is infinite-dimensional (1 ≤ p ≤ ∞).
Proof.
The set
E
= { (1, 0, 0, 0, . . .), (0, 1, 0, 0, . . .), (0, 0, 1, 0, . . .), . . .}
is linearly independent and not finite. Therefore dim `
p
(K) = ∞.
Remark: This linearly independent set E is not a Hamel basis. Indeed, the se-
quence x = (x
1
, x
2
, x
3
, . . .) with x
k
= e
−k
belongs to `
p
(K) for any p ≥ 1 but
cannot be represented as a sum of finitely many elements of the set E.
2. C[0, 1] is infinite-dimensional.
Proof:
The set E = { x
k
: k ∈ N } is an infinite linearly independent subset of
C
0
[0, 1]. Indeed, suppose
p
(x) =
n
∑
k
=1
α
k
x
k
= 0
for all x ∈ [0, 1].
Differentiating the equality n times we get p
(n)
(x) = n!α
n
= 0. Which implies
α
n
= 0. Therefore p(x) ≡ 0 implies α
k
= 0 for all k.
Note that
f
α
(x) =
x(α − x), for 0 ≤ x ≤ α
0,
for α ≤ x ≤ 1
for α ∈ (0, 1) form an uncountable linear independent subset in C[0, 1].
The linearly independent sets provided in the last two examples are not Hamel
bases. This is not a coincidence: `
p
(K) and C[0, 1] (as well as many other functional
spaces) do not have a countable Hamel basis.
2
Why?
6

Theorem 1.11 Every vector space has a Hamel basis.
The proof of this theorem is based on Zorn’s Lemma.
We note that in many interesting vector spaces (called normed spaces), a very large
number of elements should be included into a Hamel basis in order to enable repre-
sentation of every element in the form of a finite sum. Then the basis is too large to
be useful for the study of the original vector space. A natural idea would be to allow
infinite sums in the definition of a basis. In order to use infinite sums we need to de-
fine convergence which cannot be done using the axioms of vector spaces only. An
additional structure on the vector space should be defined.
7

2
Normed spaces
2.1
Norms
Definition 2.1 A norm on a vector space V is a map k · k : V → R such that for any
x
, y ∈ V and any α ∈ K:
1.
kxk ≥ 0, and kxk = 0 ⇔ x = 0
(positive definiteness);
2.
kαxk = |α| kxk
(positive homogeneity);
3.
kx + yk ≤ kxk + kyk
(triangle inequality).
The pair
(V, k · k) is called a normed space.
In other words, a normed space is a vector space equipped with a norm.
Examples:
1. R
n
with each of the following norms is a normed space:
(a) kxk =
s
n
∑
k
=1
|x
k
|
2
(b) kxk
p
=
n
∑
k
=1
|x
k
|
p
!
1/p
, 1 ≤ p < ∞
(c) kxk
∞
= max
1≤k≤n
|x
k
|.
2. `
p
(K) is a vector space with the following norm (1 ≤ p < ∞)
kxk
`
p
=
∞
∑
k
=1
|x
k
|
p
!
1/p
.
3. `
∞
(K) is a vector space with the following norm
kxk
`
∞
= sup
k
∈N
|x
k
| .
We will often use kxk
p
to denote the norm of a vector x ∈ `
p
.
In order to prove the triangle inequality for the `
p
norm, we will state and prove
several inequalities.
8

2.2
Four famous inequalities
Lemma 2.2 (Young’s inequality) If a, b > 0, 1 < p, q < ∞,
1
p
+
1
q
= 1, then
ab
≤
a
p
p
+
b
q
q
.
Proof:
Consider the function f (t) =
t
p
p
−t +
1
q
defined for t ≥ 0. Since f
0
(t) = t
p
−1
− 1
vanishes at t = 1 only, and f
00
(t) = (p − 1)t
p
−2
≥ 0, the point t = 1 is a global minimum
for f . Consequently, f (t) ≥ f (1) = 0 for all t ≥ 0. Now substitute t = ab
−q/p
:
f
(ab
−q/p
) =
a
p
b
−q
p
− ab
−q/p
+
1
q
≥ 0 .
Multiplying the inequality by b
q
yields Young’s inequality.
Lemma 2.3 (H¨older’s inequality) If 1 ≤ p, q ≤ ∞,
1
p
+
1
q
= 1, x ∈ `
p
(K), y ∈ `
q
(K),
then
∞
∑
j
=1
|x
j
y
j
| ≤ kxk
`
p
kyk
`
q
.
Proof.
If 1 < p, q < ∞, we use Young’s inequality to get that for any n ∈ N
n
∑
j
=1
|x
j
|
kxk
`
p
|y
j
|
kyk
`
q
≤
n
∑
j
=1
1
p
|x
j
|
p
kxk
p
`
p
+
1
q
|y
j
|
q
kyk
q
`
q
≤
1
p
+
1
q
= 1
Therefore for any n ∈ N
n
∑
j
=1
|x
j
y
j
| ≤ kxk
`
p
kyk
`
q
.
Since the partial sums are monotonically increasing and bounded above, the series
converge and H¨older’s inequality follows by taking the limit as n → ∞.
If p = 1 and q = ∞:
n
∑
j
=1
|x
j
y
j
| ≤ max
1≤ j≤n
|y
j
|
n
∑
j
=1
|x
j
| ≤ kxk
`
1
kyk
`
∞
.
Therefore the series converges and H¨older’s inequality follows by taking the limit as
n
→ ∞.
Lemma 2.4 (Cauchy-Schwartz inequality) If x, y ∈ `
2
(K) then
∞
∑
j
=1
|x
j
y
j
| ≤
∞
∑
j
=1
|x
j
|
2
!
1/2
∞
∑
j
=1
|y
j
|
2
!
1/2
.
9

Proof:
This inequality coincides with H¨older’s inequality with p = q = 2.
Now we state and prove the triangle inequality for the `
p
norm.
Lemma 2.5 (Minkowski’s inequality) If x, y ∈ `
p
(K) for 1 ≤ p ≤ ∞ then x + y ∈
`
p
(K) and
kx + yk
`
p
≤ kxk
`
p
+ kyk
`
p
.
Proof:
If 1 < p < ∞, define q from the equation
1
p
+
1
q
= 1. Then using H¨older’s
inequality (finite sequences belong to `
p
with any p) we get
n
∑
j
=1
|x
j
+ y
j
|
p
=
n
∑
j
=1
|x
j
+ y
j
|
p
−1
|x
j
+ y
j
|
≤
n
∑
j
=1
|x
j
+ y
j
|
p
−1
|x
j
| +
n
∑
j
=1
|x
j
+ y
j
|
p
−1
|y
j
|
≤
n
∑
j
=1
|x
j
+ y
j
|
(p−1)q
!
1/q
n
∑
j
=1
|x
j
|
p
!
1/p
(H¨older’s inequality)
+
n
∑
j
=1
|x
j
+ y
j
|
(p−1)q
!
1/q
n
∑
j
=1
|y
j
|
p
!
1/p
.
Dividing the inequality by
∑
n
j
=1
|x
j
+ y
j
|
p
1/q
and using that (p − 1)q = p and 1 −
1
q
=
1
p
, we get for all n
n
∑
j
=1
|x
j
+ y
j
|
p
!
1/p
≤
n
∑
j
=1
|x
j
|
p
!
1/p
+
n
∑
j
=1
|y
j
|
p
!
1/p
.
The series on the right hand side converge to kxk
`
p
+ kyk
`
p
. Consequently the series on
the left hand side also converge. Therefore x + y ∈ `
p
(K), and Minkowski’s inequality
follows by taking the limit as n → ∞.
Exercise: Prove Minkowski’s inequality for p = 1 and p = ∞.
2.3
Examples of norms on a space of functions
Each of the following formulae defines a norm on C[0, 1], the space of all continuous
functions on [0, 1]:
3
We do not start directly with n = ∞ because a priori we do not know convergence for some of the
series involved in the proof.
10

1. the “sup(remum) norm”
k f k
∞
= sup
t
∈[0,1]
| f (t)| ;
2. the “L
1
norm”
k f k
L
1
=
Z
1
0
| f (t)| dt ;
3. the “L
2
norm”
k f k
L
2
=
Z
1
0
| f (t)|
2
dt
1/2
.
Exercise: Check that each of these formulae defines a norm. For the case of the L
2
norm, you will need a Cauchy-Schwartz inequality for integrals.
Example: Let k ∈ N. The space C
k
[0, 1] consists of all continuous real-valued func-
tions which have continuous derivatives up to order k. The norm on C
k
[0, 1] is defined
by
k f k
C
k
=
k
∑
j
=0
sup
t
∈[0,1]
| f
( j)
(t)| ,
where f
( j)
denotes the derivative of order j.
2.4
Equivalence of norms
We have seen that various different norms can be introduced on a vector space. In
order to compare two norms it is convenient to introduce the following equivalence
relation.
Definition 2.6 Two norms k · k
1
and
k · k
2
on a vector space V are
equivalent if there
are constants c
1
, c
2
> 0 such that
c
1
kxk
1
≤ kxk
2
≤ c
2
kxk
1
for all x
∈ V .
In this case we write
k · k
1
∼ k · k
2
.
Theorem 2.7 Any two norms on R
n
are equivalent.
Example: The norms k · k
L
1
and k · k
∞
on C[0, 1] are not equivalent.
4
You already saw this statement in Analysis III and/or Differentiation in Year 2. The proof is based
on the observation that the unit sphere S ⊂ R
n
is sequentially compact. Then we checked that f (x) =
kxk
2
/kxk
1
is continuous on S and consequently it is bounded and attains its lower and upper bounds on
S
. We set c
1
= min
S
f
and c
2
= max
S
f
.
11

Proof:
Consider the sequence of functions f
n
(t) = t
n
with n ∈ N. Obviously f
n
∈
C
[0, 1] and
k f
n
k
∞
=
max
t
∈[0,1]
|t|
n
= 1 ,
k f
n
k
L
1
=
Z
1
0
t
n
dt
=
1
n
+ 1
.
Suppose the norms are equivalent. Then there is a constant c
2
> 0 such that for all n:
k f
n
k
∞
k f
n
k
L
1
= n + 1 ≤ c
2
.
But it is not possible for all n. This contradiction implies the norms are not equivalent.
2.5
Linear Isometries
Suppose V and W are normed spaces.
Definition 2.8 If a linear map L : V → W preserves norms, i.e. kL(x)k = kxk for all
x
∈ V , it is called a linear isometry.
This definition implies L is injective, i.e., L : V → L(V ) is bijective, but it does not im-
ply L(V ) = W , i.e., L is not necessarily invertible. Note that sometimes the invertibility
property is included into the definition of the isometry. Finally, in Metric Spaces the
word “isometry” is used to denote distance-preserving transformations.
Definition 2.9 We say that two normed spaces are isometric, if there is an invertible
linear isometry between them.
A linear invertible map can be used to “pull back” a norm as follows.
Proposition 2.10 Let (V, k · k
V
) be a normed space, W a vector space, and L : W → V
a linear isomorphism. Then
kxk
W
:= kL(x)k
V
defines a norm on W .
Proof:
For any x, y ∈ V and any α ∈ K we have:
kxk
W
= kL(x)k
V
≥ 0 ,
kαxk
W
= kL(αx)k
V
= |α| kL(x)k
V
= |α| kxk
W
.
If kxk
W
= kL(x)k
V
= 0, then L(x) = 0 due to non-degeneracy of the norm k · k
V
. Since
L
is invertible, we get x = 0. Therefore k · k
W
is non-degenerate.
12

Finally, the triangle inequality follows from the triangle inequality for k · k
V
:
kx + yk
W
= kL(x) + L(y)k
V
≤ kL(x)k
V
+ kL(y)k
V
= kxk
W
+ kyk
W
.
Therefore, k · k
W
is a norm.
Note that in the proposition the new norm is introduced in such a way that L :
(W, k · k
W
) → (V, k · k
V
) is a linear isometry.
Let V be a finite dimensional vector space and n = dimV . We have seen that V is
linearly isomorphic to K
n
. Then the proposition implies the following statements.
Corollary 2.11 Any finite dimensional vector space V can be equipped with a norm.
Corollary 2.12 Any n-dimensional normed space V is isometric to K
n
equipped with
a suitable norm.
Since any two norms on R
n
(and therefore on C
n
) are equivalent we also get the
following statement.
Theorem 2.13 If V is a finite-dimensional vector space, then all norms on V are equiv-
alent.
13

3
Convergence in a normed space
3.1
Definition and examples
The norm on a vector space V can be used to measure distances between points x, y ∈ V .
So we can define the limit of a sequence.
Definition 3.1 A sequence (x
n
)
∞
n
=1
, x
n
∈ V , n ∈ N, converges to a limit x ∈ V if for any
ε > 0 there is N ∈ N such that
kx
n
− xk < ε
for all n
> N.
Then we write x
n
→ x (or lim x
n
= x).
We see directly from this definition that the sequence of vectors x
n
→ x if and only
if the sequence of non-negative real numbers kx
n
− xk → 0.
Exercises: Prove the following statements.
1. The limit of a convergent sequence is unique.
2. Any convergent sequence is bounded.
3. If x
n
converges to x, then kx
n
k → kxk.
It is possible to check convergence of a sequence of real numbers without actually
finding its limit: it is sufficient to check that it satisfies the following definition:
Definition 3.2 (Cauchy sequence) A sequence (x
n
)
∞
n
=1
in a normed space V is Cauchy
if for any ε > 0 there is an N such that
kx
n
− x
m
k < ε
for all m
, n > N.
Theorem 3.3 A sequence of real numbers converges iff it is Cauchy.
Exercises: Prove the following statements.
1. Any convergent sequence is Cauchy.
2. Any Cauchy sequence is bounded.
Example: Consider the sequence f
n
∈ C[0, 1] defined by f
n
(t) = t
n
.
1. f
n
→ 0 in the L
1
norm.
Proof:
We have already computed the norms:
k f
n
k
L
1
=
1
n
+ 1
→ 0 .
Consequently, f
n
→ 0.
14

2. f
n
does not converge in the sup norm.
Proof:
If m > 2n ≥ 1 then
f
n
(2
−1/n
) − f
m
(2
−1/n
) =
1
2
−
1
2
m
/n
≥
1
4
.
Consequently ( f
n
) is not Cauchy in the sup norm and hence not convergent.
This example shows that the convergence in the L
1
norm does not imply the point-
wise convergence and, as a results, does not imply the convergence in the sup norm
(often called the uniform convergence). Note that in contrast to the uniform and L
1
convergences the notion of pointwise convergence is not based on a norm on the space
of continuous function.
Exercise: The pointwise convergence does not imply the L
1
convergence.
Hint: Construct f
n
with support in (0, 1/n) but make the maximum of f
n
very large
to ensure that k f
n
k
L
1
> n. Therefore f
n
(t) → 0 for every t ∈ [0, 1] but f
n
is not bounded
in the L
1
norm, hence not convergent.
Proposition 3.4 If f
n
∈ C[0, 1] for all n ∈ N and f
n
→ f in the sup norm, then f
n
→ f
in the L
1
norm, i.e.,
k f
n
− f k
∞
→ 0
=⇒
k f
n
− f k
L
1
→ 0 .
Proof:
0 ≤ k f
n
− f k
L
1
=
Z
1
0
| f
n
(t) − f (t)| dt ≤ sup
0≤t≤1
| f
n
(t) − f (t)| = k f
n
− f k
∞
→ 0 .
Therefore k f
n
− f k
L
1
→ 0.
We have seen that different norms may lead to different conclusions about conver-
gence of a given sequence but sometime convergence in one norm implies convergence
in another one. The following lemma shows that equivalent norms give rise to the same
notion of convergence.
Lemma 3.5 Suppose k · k
1
and
k · k
2
are equivalent norms on a vector space V . Then
for any sequence
(x
n
):
kx
n
− xk
1
→ 0
⇔
kx
n
− xk
2
→ 0 .
Proof:
Since the norms are equivalent, there are constant c
1
, c
2
> 0 such that
0 ≤ c
1
kx
n
− xk
1
≤ kx
n
− xk
2
≤ c
2
kx
n
− xk
1
for all n. Then kx
n
− xk
2
→ 0 implies kx
n
− xk
1
→ 0, and vice versa.
15

3.2
Topology on a normed space
We say that a collection
T of subsets of V is a topology on V if it satisfies the following
properties:
1. /0,V ∈
T ;
2. any finite intersection of elements of
T belongs to T ;
3. any union of elements of
T belongs to T .
A set equipped with a topology is called a topological space. The elements of
T
are called open sets. The topology can be used to define a convergent sequence and
continuous function.
A norm on V can be used to define a topology on V , i.e., to define the notion of an
open set.
Definition 3.6 A subset X ⊂ V is open, if for any x ∈ X there is ε > 0 such that the
ball of radius ε centred around x belongs to X :
B
(x, ε) = {y ∈ V : ky − xk < ε} ⊂ X .
Example: In any normed space V :
1. The unit ball centred around the zero, B
0
= { x : kxk < 1 }, is open.
2. Any open ball B(x, ε) is open.
3. V is open.
4. The empty set is open.
It is not too difficult to check that the collection of open sets defines a topology
on V . You can easily check from the definition that equivalent norms generate the
same topology, i.e., open sets are exactly the same. The notion of convergence can be
defined in terms of the topology.
Definition 3.7 An open neighbourhood of x is an open set which contains x.
Lemma 3.8 A sequence x
n
→ x if and only if for any open neighbourhood X of x there
is N
∈ N such that x
n
∈ X for all n > N.
Proof:
( =⇒ ). Let x
n
→ x. Take any open X such that x ∈ X. Then there is ε > 0 such
that B(x, ε) ⊂ X . Since the sequence converges there is N such that kx
n
− xk < ε for
all n > N. Then x
n
∈ B(x, ε) ⊂ X for the same values of n.
(⇐=). Take any ε > 0. The ball B(x, ε) is open, therefore there is N such that
x
n
∈ B(x, ε) for all n > N. Hence kx
n
− xk < ε and x
n
→ x.
16

3.3
Closed sets
Definition 3.9 A set X ⊂ V is closed if its complement V \ X is open.
Example: In any normed space V :
1. The unit sphere S = { x : kxk = 1 } is closed
2. Any closed ball
B
(x, ε) = { y ∈ V : ky − xk ≤ ε }
is closed.
3. V is closed.
4. The empty set is closed.
Lemma 3.10 A subset X ⊂ V is closed if and only if any convergent sequence with
elements in X has its limit in X .
Exercise: Prove it. (You have seen the proof in Year 2).
Definition 3.11 We say that a subset L ⊂ V is a linear subspace, if it is a vector space
itself, i.e., if x
1
, x
2
∈ L and λ ∈ K imply x
1
+ λ x
2
∈ L.
Proposition 3.12 A finite dimensional linear subspace W of a normed space V is
closed.
Proof:
Since n = dimW < ∞, there is a finite Hamel basis on W :
E
= { e
1
, e
2
, . . . , e
n
} ,
Span(E) = W ,
Suppose W is not closed, then by Lemma 3.10 there is a convergent sequence x
k
→ x
∗
,
x
k
∈ W but x
∗
∈ V \ W . Then x
∗
is linearly independent from E (otherwise it would
belong to W ). Consequently
˜
E
= { e
1
, e
2
, . . . , e
n
, x
∗
}
is a Hamel basis in ˜L = Span( ˜
E
). In this basis, the components of x
k
are given by
(α
k
1
, . . . , α
k
n
, 0) and x
∗
corresponds to the vector (0, . . . , 0, 1). In a finite dimensional
normed vector space, a sequence of vectors converges iff each component converges.
We get in the limit as k → ∞
(α
k
1
, . . . , α
k
n
, 0) → (0, . . . , 0, 1) ,
which is obviously impossible. Therefore W is closed.
Example: The subspace of polynomial functions is linear but not closed in C[0, 1]
equipped with the sup norm.
17

3.4
Compactness
Definition 3.13 (sequential compactness) A subset K of a normed space (V, k · k
V
) is
(sequentially) compact if any sequence
(x
n
)
∞
n
=1
with x
n
∈ K has a convergent subse-
quence x
n
j
→ x
∗
with x
∗
∈ K.
Proposition 3.14 A compact set is closed and bounded.
Theorem 3.15 A subset of R
n
is compact iff it is closed and bounded.
Corollary 3.16 A subset of a finite-dimensional vector space is compact iff it is closed
and bounded.
Example: The unit sphere in `
p
(K) is closed, bounded but not compact.
Proof:
Take the sequence e
j
such that
e
j
= (0, . . . , 0,
1
|{z}
j
th
place
, 0, . . .) .
We note that ke
j
− e
k
k
`
p
= 2
1/p
for all j 6= k. Consequently, (e
j
)
∞
j
=1
does not have any
convergent subsequence, hence S is not compact.
Lemma 3.17 (Riesz’ Lemma) Let X be a normed vector space and Y be a closed
linear subspace of X such that Y
6= X and α ∈ R, 0 < α < 1. Then there is x
α
∈ X
such that
kx
α
k = 1 and kx
α
− yk > α for all y ∈ Y .
Proof:
Since Y ⊂ X and Y 6= X there is x ∈ X \Y . Since Y is closed, X \Y is open and
therefore
d
:= inf{ kx − yk : y ∈ Y } > 0 .
Since α
−1
> 1 there is a point z ∈ Y such that kx − zk < dα
−1
. Let x
α
=
x
−z
kx−zk
. Then
kx
α
k = 1 and for any y ∈ Y ,
kx
α
− yk =
x
− z
kx − zk
− y
=
x
− (z + kx − zky)
kx − zk
>
d
dα
−1
= α ,
as z + kx − zky ∈ Y because Y is a linear subspace.
Theorem 3.18 A normed space is finite dimensional iff the unit sphere is compact.
Proof:
If the vector space is finite-dimensional, Corollary 3.16 imply the unit sphere
is compact as the unit sphere is bounded and closed.
So we only need to show that if the unit sphere S is sequentially compact, then the
normed space V is finite dimensional. Suppose that dimV = ∞. Then Riesz’ Lemma
can be used to construct an infinite sequence of x
n
∈ S such that kx
n
− x
m
k >
1
2
> 0
18

for all m 6= n. This sequence does not have any convergent subsequence (none of its
subsequences is Cauchy) and therefore S is not compact.
We construct x
n
inductively. First take any x
1
∈ S. Then suppose that for some n ≥ 1
we have found x
1
, . . . , x
n
∈ S such that kx
l
− x
k
k >
1
2
for all 1 ≤ k, l ≤ n, k 6= l (note
that this property is satisfied for n = 1). The linear subspace Y
n
= Span(x
1
, . . . , x
n
) is
n
-dimensional and hence closed (see Proposition 3.12). Since X is infinite dimensional
Y
n
6= X. Then Riesz’ Lemma with Y = Y
n
and α =
1
2
implies that there is x
n
+1
∈ S such
that kx
n
+1
− x
k
k >
1
2
for all 1 ≤ k ≤ n.
Repeating this argument inductively we generate the sequence x
n
for all n ∈ N.
19

4
Banach spaces
4.1
Completeness: Definition and examples
Definition 4.1 (Banach space) A normed space V is called complete if any Cauchy
sequence in V converges to a limit in V . A complete normed space is called a
Banach
space.
Theorem 3.3 implies that R is complete, i.e., every Cauchy sequence of numbers
has a limit. We also know that C is complete (Do you know how to deduce it from the
completeness of R?).
Theorem 4.2 Every finite-dimensional normed space is complete.
Proof:
Now let V be a vector space over K ∈ {R, C}, dimV = n < ∞. Take any
basis in V . Then a sequence of vectors in V converges iff each component of the
vectors converges, and a sequence of vectors is Cauchy iff each component is Cauchy.
Therefore each component has a limit, and those limits constitute the limit vector for
the original sequence. Hence V is complete.
In particular, R
n
and C
n
are complete.
Theorem 4.3 (`
p
is a Banach space) The space `
p
(K) equipped with the standard `
p
norm is complete.
Proof:
Suppose that x
k
= (x
k
1
, x
k
2
, . . .) ∈ `
p
(K) is Cauchy. Then for every ε > 0 there is
N
such that
kx
m
− x
n
k
p
`
p
=
∞
∑
j
=1
|x
m
j
− x
n
j
|
p
< ε
for all m, n > N. Consequently, for each j ∈ N the sequence x
k
j
is Cauchy, and the
completeness of K implies that there is a
j
∈ K such that
x
k
j
→ a
j
as k → ∞. Let a = (a
1
, a
2
, . . .). First we note that for any M ≥ 1 and m, n > N:
M
∑
j
=1
|x
m
j
− x
n
j
|
p
≤
∞
∑
j
=1
|x
m
j
− x
n
j
|
p
< ε .
Taking the limit as n → ∞ we get
M
∑
j
=1
|x
m
j
− a
j
|
p
≤ ε .
20

This holds for any M, so we can take the limit as M → ∞:
∞
∑
j
=1
|x
m
j
− a
j
|
p
≤ ε .
We conclude that x
m
− a ∈ `
p
(K). Since `
p
(K) is a vector space and x
m
∈ `
p
(K), then
a
∈ `
p
(K). Moreover, kx
m
− ak
`
p
< ε for all m > N. Consequently x
m
→ a in `
p
(K)
with the standard norm, and so `
p
(K) is complete.
Theorem 4.4 (C is a Banach space) The space C[0, 1] equipped with the sup norm is
complete.
Proof:
Let f
k
be a Cauchy sequence. Then for any ε > 0 there is N such that
sup
t
∈[0,1]
| f
n
(t) − f
m
(t)| < ε
for all m, n > N. In particular, f
n
(t) is Cauchy for any fixed t and consequently has a
limit. Set
f
(t) = lim
n
→∞
f
n
(t) .
Let’s prove that f
n
(t) → f (t) uniformly in t. Indeed, we already know that
| f
n
(t) − f
m
(t)| < ε
for all n, m > N and all t ∈ [0, 1]. Taking the limit as m → ∞ we get
| f
n
(t) − f (t)| ≤ ε
for all n > N and all t ∈ [0, 1]. Therefore f
n
converges uniformly:
k f
n
− f k
∞
= sup
t
∈[0,1]
| f
n
(t) − f (t)| < ε .
for all n > N. The uniform limit of a sequence of continuous functions is continuous.
Consequently, f ∈ C[0, 1] which completes the proof of completeness.
Example: The space C[0, 2] equipped with the L
1
norm is not complete.
Proof:
Consider the following sequence of functions:
f
n
(t) =
( t
n
for 0 ≤ t ≤ 1,
1
for 1 ≤ t ≤ 2 .
This is a Cauchy sequence in the L
1
norm. Indeed for any n < m:
k f
n
− f
m
k
L
1
=
Z
1
0
(t
n
− t
m
) dt =
1
n
+ 1
−
1
m
+ 1
<
1
n
+ 1
,
21

and consequently for any m, n > N
k f
n
− f
m
k
L
1
<
1
N
.
Now let us show that f
n
do not converge to a continuous function in the L
1
norm.
Indeed, suppose such a limit exists and call it f . Then
k f
n
− f k
L
1
=
Z
1
0
|t
n
− f (t)| dt +
Z
2
1
|1 − f (t)| dt → 0 .
Since
| f (t)| − |t
n
| ≤ |t
n
− f (t)| ≤ | f (t)| + |t
n
|
implies that
Z
1
0
| f (t)| dt −
Z
1
0
t
n
dt
≤
Z
1
0
|t
n
− f (t)| dt ≤
Z
1
0
| f (t)| dt +
Z
1
0
t
n
dt
,
we have
R
1
0
|t
n
− f (t)| dt →
R
1
0
| f (t)| dt as n → ∞ and consequently
Z
1
0
| f (t)| dt +
Z
2
1
|1 − f (t)| dt = 0 .
As f is assumed to be continuous, it follows
f
(t) =
0, 0 < t < 1,
1, 1 < t < 2.
We see that the limit function f cannot be continuous. This contradiction implies that
C
[0, 2] is not complete with respect to the L
1
norm.
4.2
The completion of a normed space
A normed space may be incomplete. However, every normed space X can be con-
sidered as a subset of a larger Banach space ˆ
X
. The minimal among these spaces
is
called the completion of X .
Informally we can say that ˆ
X
consists of limit points of all Cauchy sequences in X .
Of course, every point x ∈ X is a limit point of the constant sequence (x
n
= x for all
n
∈ N) and therefore X ⊂ ˆ
X
. If X is not complete, some of the limit points are not in
X
, so ˆ
X
is larger then the original set X .
Definition 4.5 (dense set) We say that a subset X ⊂ V is dense in V if for any v ∈ V
and any ε > 0 there is x ∈ X such that kx − vk < ε.
5
In this context “minimal” means that if any other space ˜
X
has the same property, then the minimal
ˆ
X
is isometric to a subspace of ˜
X
. It turns out that this property can be achieved by requiring X to be
dense in ˆ
X
.
22

Note that X is dense in V iff for every point v ∈ V there is a sequence x
n
∈ X such
that x
n
→ v.
Theorem 4.6 Let (X , k · k
X
) be a normed space. Then there is a complete normed
space
(
X ,k · k
X
) and a linear map i : X →
X such that i is an isometrical isomor-
phism between
(X , k · k
X
) and (i(X ), k · k
X
), and i(X ) is dense in
X .
Moreover,
X is unique up to isometry, i.e., if there is another complete normed
space
( ˜
X ,k·k
˜
X
) with these properties, then
X and ˜
X are isometrically isomorphic.
Proof:
The proof is relatively long so we break it into a sequence of steps.
Construction of
X . Let Y be the set of all Cauchy sequences in X. We say that two
Cauchy sequences x = (x
n
)
∞
n
=1
, x
n
∈ X, and y = (y
n
)
∞
n
=1
, y
n
∈ X, are equivalent, and
write x ∼ y, if
lim
n
→∞
kx
n
− y
n
k
X
= 0 .
Let
X be the space of all equivalence classes in Y , i.e., it is the factor space: X =
Y / ∼. The elements of X are collections of equivalent Cauchy sequences from X.
We will use [x] to denote the equivalence class of x.
Exercises: Show that
X is a vector space.
Norm on
X . For an η ∈ X take any representative x = (x
n
)
∞
n
=1
, x
n
∈ X, of the
equivalence class η. Then the equation
kηk
X
= lim
n
→∞
kx
n
k
X
.
(4.1)
defines a norm on
X . Indeed:
1. Equation (4.1) defines a function
X → R, i.e., for any η ∈ X and any repre-
sentative x ∈ η the limit exists and is independent from the choice of the repre-
sentative. (Exercise)
2. The function defined by (4.1) satisfies the axioms of norm. (Exercise)
Definition of i : X →
X . For any x ∈ X let
i
(x) = [(x, x, x, x, . . .)]
(the equivalence class of the constant sequence). Obviously, i is a linear isometry,
and it is a bijection X → i(X ). Therefore the spaces X and i(X ) are isometrically
isomorphic.
Completeness of
X . Let
η
(k)
∞
k
=1
be a Cauchy sequence in (
X ,k · k
X
). For every
k
∈ N take a representative x
(k)
∈ η
(k)
. Note that x
(k)
∈
Y is a Cauchy sequence in the
space (X , k · k
X
). Then there is a strictly monotone sequence of integers n
k
such that
x
(k)
j
− x
(k)
l
X
≤
1
k
for all j, l ≥ n
k
.
(4.2)
23

Now consider the sequence x
∗
defined by
x
∗
=
x
(k)
n
k
∞
k
=1
.
Next we will check that x
∗
is Cauchy, and consider its equivalence class η
∗
= [x
∗
] ∈
X . Then we will prove that η
(k)
→ η
∗
in (
X ,k · k
X
).
The sequence
x
∗
is Cauchy.
Since the sequence of η
(k)
is Cauchy, for any ε > 0 there
is M
ε
such that
lim
n
→∞
kx
(k)
n
− x
(l)
n
k
X
= kη
(k)
− η
(l)
k
X
< ε
for all k, l > M
ε
.
Consequently, for every k, l > M
ε
there is N
k
,l
ε
such that
kx
(k)
n
− x
(l)
n
k
X
< ε
for all n > N
k
,l
ε
.
(4.3)
Then fix any ε > 0. If j, l >
3
ε
and m > max{n
j
, n
l
, N
k
,l
ε /3
} we have
kx
∗
j
− x
∗
l
k
X
= kx
( j)
n
j
− x
(l)
n
l
k
X
≤ kx
( j)
n
j
− x
( j)
m
k
X
+ kx
( j)
m
− x
(l)
m
k
X
+ kx
(l)
m
− x
(l)
n
l
k
X
<
1
j
+
ε
3
+
1
l
< ε
where we used (4.3) and (4.2). Therefore x
∗
is Cauchy and η = [x
∗
] ∈
X .
The sequence η
(k)
→ [x
∗
]. Indeed, take any ε > 0 and k > 3ε
−1
, then
kη
(k)
− η
∗
k
X
=
lim
j
→∞
kx
(k)
j
− x
∗
j
k
X
= lim
j
→∞
kx
(k)
j
− x
( j)
n
j
k
X
≤
lim
j
→∞
kx
(k)
j
− x
(k)
n
k
k
X
+ kx
(k)
n
k
− x
( j)
n
j
k
X
≤
1
k
+ ε < 2ε
Therefore η
(k)
→ η
∗
.
We have proved that any Cauchy sequence in
X has a limit in X , so X is com-
plete.
Density of i(X ) in
X . Take an η ∈ X and let x ∈ η. Take any ε > 0. Since x is
Cauchy, there is N
ε
such that kx
m
− x
k
k
X
< ε for all k, m > N
ε
. Then
kη − i(x
k
)k
X
= lim
m
→∞
kx
m
− x
k
k
X
≤ ε.
Therefore i(X ) is dense in
X .
24

Uniqueness of
X up to isometry. Suppose that ˜
X is a complete normed space,
˜i : X → ˜
X is an isometry and ˜i(X) is dense in ˜
X . Then X and ˜
X are isometric.
Indeed, since i and ˜i are isometries, the linear map ˆL : ˜i(X ) → i(X ) defined by L =
i
◦ ˜i
−1
is also an isometry. Then define the map L : ˜
X → X by continuously extending
ˆL, i.e. if ξ
n
→ ξ and ξ
n
belong to the domain of ˆL, then set L(ξ ) = lim
n
→∞
ˆL(ξ
n
).
Exercise: show that
1. L(ξ ) is independent from the choice of the sequence ξ
n
→ ξ .
2. L(ξ ) = ˆL(ξ ) for all ξ ∈ ˜i(X ).
3. L is a linear map defined for all ξ ∈
˜
X .
4. L is an isometry and L( ˜
X ) = X .
This completes the uniqueness statement of the theorem.
The theorem provides an explicit construction for the completion of a normed
space. Often this description is not sufficiently convenient and a more direct descrip-
tion is desirable.
Example: Let `
f
(K) be the space of all sequences which have only a finite number
of non-zero elements. This space is not complete in the `
p
norm. The completion of
`
f
(K) in the `
p
norm is isometric to `
p
(K).
Indeed, we have already seen that `
p
(K) is complete. So in order to prove the claim
you only need to check that `
f
(K) is dense in `
p
(K) (Exercise).
We see that the completion of a normed space depends both on the space and on
the norm.
4.3
Weierstrass Approximation Theorem
In this section we will prove an approximation theorem which is independent from the
discussions of the previous lectures. This theorem implies that polynomials are dense
in the space of continuous functions on an interval.
Theorem 4.7 (Weierstrass Approximation Theorem) If f : [0, 1] → R is continuous
on
[0, 1] then the sequence of polynomials
P
n
(x) =
n
∑
p
=0
n
p
f
(p/n)x
p
(1 − x)
n
−p
uniformly converges to f on
[0, 1].
25

Proof:
First we derive several useful identities. The binomial theorem states that
(x + y)
n
=
n
∑
p
=0
n
p
x
p
y
n
−p
.
Differentiating with respect to x and multiplying by x we get
nx
(x + y)
n
−1
=
n
∑
p
=0
p
n
p
x
p
y
n
−p
.
Differentiating the original identity twice with respect to x and multiplying by x
2
we
get
n
(n − 1)x
2
(x + y)
n
−2
=
n
∑
p
=0
p
(p − 1)
n
p
x
p
y
n
−p
.
Now substitute y = 1 − x and denote
r
p
(x) =
n
p
x
p
(1 − x)
n
−p
.
We get
n
∑
p
=0
r
p
(x) = 1 ,
n
∑
p
=0
pr
p
(x) = nx .
n
∑
p
=0
p
(p − 1)r
p
(x) = n(n − 1)x
2
.
Consequently,
n
∑
p
=0
(p − nx)
2
r
p
(x) =
n
∑
p
=0
p
2
r
p
(x) − 2nx
n
∑
p
=0
pr
p
(x) + n
2
x
2
n
∑
p
=0
r
p
(x)
= n(n − 1)x
2
+ nx − 2(nx)
2
+ n
2
x
2
= nx(1 − x) .
Note that as f is continuous it is also uniformly continuous: Take any ε > 0, there is
δ > 0 such that
|x − y| < δ
=⇒
| f (x) − f (y)| < ε .
26

Now we can estimate
| f (x) − P
n
(x)| =
f
(x) −
n
∑
p
=0
f
(p/n)r
p
(x)
=
n
∑
p
=0
f
(x) − f (p/n)
r
p
(x)
≤
∑
|x−p/n|<δ
f
(x) − f (p/n)
r
p
(x)
+
∑
|x−p/n|>δ
f
(x) − f (p/n)
r
p
(x)
.
The first sum is bounded by
∑
|x−p/n|<δ
f
(x) − f (p/n)
r
p
(x)
≤ ε
∑
|x−p/n|<δ
r
p
(x) < ε
The second sum is bounded by
∑
|x−p/n|>δ
f
(x) − f (p/n)
r
p
(x)
≤ 2k f k
∞
∑
|nx−p|>nδ
r
p
(x)
≤ 2k f k
∞
n
∑
p
=0
(p − nx)
2
n
2
δ
2
r
p
(x)
= 2k f k
∞
x
(1 − x)
nδ
2
≤
k f k
∞
2nδ
2
which is less than ε for any n >
k f k
∞
2δ
2
ε
. Therefore for these values of n
| f (x) − P
n
(x)| < 2ε .
Consequently,
k f − P
n
k
∞
= sup
x
∈[0,1]
| f (x) − P
n
(x)| → 0
as n → ∞.
Consider the space P[0, 1] of all polynomial functions restricted to the interval [0, 1]
and equip this space with the sup norm. This space is not complete (think about Taylor
series). On the other hand polynomials are continuous, and therefore P[0, 1] can be
considered as a subspace of C[0, 1] which is complete. The Weierstrass approximation
theorem states that any continuous function on [0, 1] can be uniformly approximated by
polynomials. In other words, the polynomials are dense in C[0, 1]. Then Theorem 4.6
implies that the completion of the polynomials P[0, 1] is isometric to C[0, 1] equipped
with the sup norm.
Corollary 4.8 The set of polynomials is dense in C[0, 1] equipped with the supremum
norm.
27

5
Lebesgue spaces
Lebesgue spaces play an important role in Functional Analysis and some of its applica-
tions. These spaces consist of integrable functions. In this section we discuss the main
definitions and the properties required later in this module. Most of the statements
are given without proofs. A more detailed study of these topics is a part of MA359
Measure Theory module.
5.1
Lebesgue measure
First we need to define a measure, which can be considered as a generalisation of the
length of an interval. A measure can be defined for a class of subsets which are called
measurable
and form a σ -algebra.
Definition 5.1 A σ -algebra is a class Σ of subsets of a set X which have the following
properties:
(a)
/0, X ∈ Σ,
(b) if S
∈ Σ then X \ S ∈ Σ,
(c) if S
n
∈ Σ for all n ∈ N then
S
∞
n
=1
S
n
∈ Σ.
Definition 5.2 A function µ : Σ → R is a measure, if it has the following properties:
(a) µ(S) ≥ 0 for all S ∈ Σ,
(b) µ( /0) = 0,
(c) µ is countably additive, i.e., if the sets S
n
∈ Σ are pairwise disjoint (S
n
∩ S
m
= /0
for n
6= m), then
µ
∞
[
n
=1
S
n
!
=
∞
∑
n
=1
µ (S
n
) .
The triple (X , Σ, µ) is called a measure space.
The Lebesgue measure is defined on R
n
and coincides with the standard volume
for those sets, where the standard volume can be defined. It is the only measure we
consider in this module. The Lebesgue measure is constructed in the following way.
A box in R
n
is a set of the form
B
=
n
∏
i
=1
[a
i
, b
i
] ,
where b
i
≥ a
i
. The volume vol(B) of this box is defined to be
vol(B) =
n
∏
i
=1
(b
i
− a
i
) .
28

For any subset A of R
n
, we can define its outer measure λ
∗
(A) by:
λ
∗
(A) = inf
(
∑
B
∈
C
vol(B) :
C is a countable collection of boxes whose union covers A
)
.
Finally, the set A is called Lebesgue measurable if for every S ⊂ R
n
,
λ
∗
(S) = λ
∗
(A ∩ S) + λ
∗
(S − A) .
If A is measurable, then its Lebesgue measure is defined by µ(A) = λ
∗
(A).
Lebesgue measurable sets form a σ -algebra.
Lebesgue measure of a box coincides with the volume of the box. The class of
Lebesgue measurable sets is very large and existence of sets which are not Lebesgue
measurable is equivalent to the axiom of choice.
In particular, a Borel set is any set in a topological space that can be formed from
open sets (or, equivalently, from closed sets) through the operations of countable union,
countable intersection, and relative complement. The Borel sets also form a σ -algebra.
We note that the σ -algebra of Lebesgue measurable sets includes all Borel sets.
Sets of measure zero
We say that a set A has measure zero if µ(A)=0.
Proposition 5.3 A set A ⊂ R has Lebesgue measure zero iff for any ε > 0 there is an
(at most countable) collection of intervals that cover A and whose total length is less
than ε:
A
⊂
∞
[
j
=1
[a
j
, b
j
]
and
∞
∑
j
=1
|b
j
− a
j
| < ε .
Corollary 5.4 The Lebesgue measure has the following property: any subset of a mea-
sure zero set is measurable and itself has measure zero.
Exercise: Show that a countable union of measure zero sets has measure zero. Hint:
for A
n
choose a cover with ε
n
= ε/2
n
.
Examples. The set Q of all rational numbers has measure zero. The Cantor set has
measure zero.
Definition 5.5 A property is said to hold “almost everywhere” or “for almost every
x” (and abbreviated to “a.e.”) if the set of points at which the property does not hold
has measure zero.
29

5.2
Lebesgue integral
Integrals of simple functions
We say that ϕ : X → R is a simple function if it can be represented as a finite sum
ϕ (x) =
n
∑
j
=1
c
j
χ
S
j
(x) ,
where c
j
∈ R, S
j
∈ Σ and χ
S
is the characteristic function of a set S ⊂ X :
χ
S
(x) =
1, x ∈ S ,
0, x 6∈ S .
We define the integral of a simple function
Z
ϕ :=
n
∑
j
=1
c
j
µ (S
j
).
We note that if all S
j
are intervals, this sum equals to the Riemann integral which you
studied in Year 1, i.e.,
R
ϕ is the “algebraic” area under the graph of the step function
ϕ (the area is counted negative on those intervals where ϕ (x) < 0).
Lebesgue integrable functions
Definition 5.6 A function f : X → R is measurable if preimage of any interval is mea-
surable.
We note that sums, products and pointwise limits of measurable functions are mea-
surable. If f is measurable, then | f | and f
±
are also measurable, where f
+
(x) =
max{ f (x), 0} and f
−
(x) = − min{ f (x), 0}. Note that both f
+
, f
−
≥ 0 and f = f
+
− f
−
.
Definition 5.7 If a function f : R → R is measurable and f ≥ 0 then
Z
f
= sup
Z
ϕ : ϕ is a simple function and 0 ≤ ϕ (x) ≤ f (x) for all x
.
If f is measurable and
R
| f | < ∞ then
Z
f
=
Z
f
+
−
Z
f
−
and we say that f is integrable on X .
30

Properties of Lebesgue integrals
First we state the main elementary properties of the Lebesgue integration.
Theorem 5.8 If f , f
1
, f
2
are integrable and λ ∈ R, then
1. f
1
+ λ f
2
is also integrable and
R
( f
1
+ λ f
2
) =
R
f
1
+ λ
R
f
2
.
2.
| f | is also integrable and |
R
f |
≤
R
| f |.
3. If additionally f
(x) ≥ 0 a.e., then
R
f
≥ 0.
We note that | f | is integrable (measurable) does not imply that f is integrable
Exercise: If f is integrable than f
+
and f
−
} are integrable. (Hint: f
+
= ( f + | f |)/2
and f
−
= (| f | − f )/2)
Integrals and limits
You should be careful when swapping lim and
R
:
Examples:
lim
n
→∞
Z
nχ
0,
1
n
= 1 6=
Z
lim
n
→∞
nχ
0,
1
n
= 0 .
lim
n
→∞
Z
1
n
χ
(0,n)
= 1 6=
Z
lim
n
→∞
1
n
χ
(0,n)
= 0 .
The following two theorems establish conditions which allow swapping the limit
and integration. They play the fundamental role in the theory of Lebesgue integrals.
Theorem 5.9 (Monotone Convergence Theorem) Suppose that f
n
are integrable func-
tions, f
n
(x) ≤ f
n
+1
(x) a.e., and
R
f
n
< K for some constant independent of n. Then
there is an integrable function g such that f
n
(x) → g(x) a.e. and
Z
g
= lim
n
→∞
Z
f
n
.
Corollary 5.10 If f is integrable and
R
| f | = 0, then f (x) = 0 a.e.
6
Indeed, we can sketch an example. It is based on partitioning the interval [0, 1] into two very nasty
subsets. So let f (x) = 0 outside [0, 1], for x ∈ [0, 1] let f (x) = 1 if x belongs to the Vitali set and
f
(x) = −1 otherwise. Then | f | = χ
[0,1]
and is integrable, but f is not integrable as the Vitali set is not
measurable.
31

Proof:
Let f
n
(x) = n| f (x)|. This sequence satisfies MCT (integrable, increasing and
R
f
n
= 0 < 1), consequently there is an integrable g(x) such that f
n
(x) → g(x) for a.e.
x
. Since the sequence is increasing, f
n
(x) ≤ g(x) a.e. which implies | f (x)| ≤ g(x)/n
for all n and a.e. x. Consequently f (x) = 0 a.e.
Theorem 5.11 (Dominated Convergence Theorem) Suppose that f
n
are integrable
functions and f
n
(x) → f (x) for a.e. x.. If there is an integrable function g such that
| f
n
(x)| ≤ g(x) for every n and a.e. x, then f is integrable and
Z
f
= lim
n
→∞
Z
f
n
.
It is also possible to integrate complex valued functions: f : R → C is integrable if
its real and imaginary parts are both integrable, and
Z
f
:=
Z
Re f + i
Z
Im f .
The MCT has no meaning for complex valued functions. The DCT is valid without
modifications (and indeed follows easily from the real version).
5.3
Lebesgue space L
1
(R)
Definition 5.12 The Lebesgue space L
1
(R) is the space of Lebesgue integrable func-
tions modulo the following equivalence relation: f
∼ g iff f (x) = g(x) a.e. The
Lebesgue space is equipped with the L
1
norm:
k f k
L
1
=
Z
| f | .
Note that the value of the integral in the right-hand-side is independent from the choice
of a representative of the equivalence class.
It is convenient to think about elements of L
1
(R) as functions R → R interpreting
the equality f = g as f (x) = g(x) a.e.
From the viewpoint of Functional Analysis, the equivalence relation is introduced
to ensure non-degeneracy of the L
1
norm. Indeed, suppose f is an integrable function.
Then k f k
1
= 0 is equivalent to
R
| f | = 0, which is equivalent to f (x) = 0 a.e.
Theorem 5.13 L
1
(R) is a Banach space.
The properties of the Lebesgue integral imply that L
1
(R) is a normed space. The
completeness of L
1
(R) follows from the combination of the following two statements:
The first lemma gives a criterion for completeness of a normed space, and the second
one implies that the assumptions of the first lemma are satisfied for X = L
1
(R).
32

Lemma 5.14 If (X , k · k) is a normed space in which
∞
∑
j
=1
ky
j
k < ∞
implies the series ∑
∞
j
=1
y
j
converges, then X is complete.
Proof:
Let x
j
∈ X be a Cauchy sequence. Then there is a monotone increasing se-
quence n
k
∈ N such that for every k ∈ N
kx
j
− x
l
k < 2
−k
for all k, l ≥ n
k
.
Let y
1
= x
n
1
and y
k
= x
n
k
− x
n
k
−1
for k ≥ 1. Since ky
k
k ≤ 2
1−k
for k ≥ 2,
∞
∑
k
=1
ky
k
k
X
≤ kx
n
1
k +
∞
∑
k
=1
2
−k
= kx
n
1
k + 1 < ∞ .
By the assumption of the lemma, the series converges and therefore there is x
∗
∈ X
such that
x
∗
=
∞
∑
j
=1
y
j
.
On the other hand
k
∑
j
=1
y
j
= x
n
1
+
k
∑
j
=2
(x
n
j
− x
n
j
−1
) = x
n
k
and therefore x
n
k
→ x
∗
. Consequently x
k
→ x
∗
and the space X is complete.
Lemma 5.15 If ( f
k
)
∞
k
=1
is a sequence of integrable functions such that ∑
∞
k
=1
k f
k
k
L
1
<
∞, then
1. ∑
∞
k
=1
| f
k
(x)| converges a.e. to an integrable function,
2. ∑
∞
k
=1
f
k
(x) converges a.e. to an integrable function.
Proof:
The first statement follows from MCT applied to the sequence g
n
= ∑
n
k
=1
| f
k
|
and K = ∑
∞
k
=1
k f
k
k
L
1
. So there is an integrable function g(x) such that
g
(x) =
∞
∑
k
=1
| f
k
(x)|
for almost all x. For these values of x the partial sums h
n
(x) = ∑
n
k
=1
f
k
(x) obviously
converge, so let
h
(x) =
∞
∑
k
=1
f
k
(x).
33

Moreover
|h
n
(x)| =
n
∑
k
=1
f
k
(x)
≤
n
∑
k
=1
| f
k
(x)| ≤
∞
∑
k
=1
| f
k
(x)| = g(x) .
Therefore the partial sums h
n
satisfy DCT and the second statement follows.
Exercise: Check that Lemma 5.15 implies that L
1
(R) satisfies the assumptions of
Lemma 5.14.
In addition to L
1
(R) we will sometimes consider the Lebesgue spaces L
1
(a, b)
where (a, b) is an interval.
Proposition 5.16 The space C[0, 1] is dense in L
1
(0, 1).
About proof:
The proof uses that simple functions (=piecewise constant functions)
are dense in L
1
(0, 1). Then check that every step function can be approximated by a
piecewise linear continuous function.
Consequently L
1
(a, b) is isometric to the completion of C[a, b] in the L
1
norm.
5.4
L
p
spaces
Another important class of Lebesgue spaces consists of L
p
spaces for 1 ≤ p < ∞,
among those the L
2
space is the most remarkable (it is also a Hilbert space, see the
next chapter for details). In this section we will sketch the main definitions of those
spaces noting that the full discussion requires more knowledge of Measure Theory
than we can fit into this module.
The Lebesgue space L
p
(I) is the space of all measurable functions f such that
k f k
L
p
=
Z
I
| f |
p
1/p
< ∞
modulo the equivalence relation: f = g iff f (x) = g(x) a.e.
We note that in this case L
p
(a, b) ⊂ L
1
(a, b).
We note that although L
1
(R) ∩ L
2
(R) 6= /0 (e.g. both spaces contain all “simple”
functions) none of those spaces is a subset of the other one. For example,
f
(x) =
1
1 + |x|
belongs to L
2
(R) but not to L
1
(R). Indeed,
R
f
2
< ∞ but
R
f
= ∞ so it is not integrable
on R. On the other hand
g
(x) =
χ
(0,1)
(x)
|x|
1/2
belongs to L
1
(R) but not to L
2
(R).
34

Theorem 5.17 L
p
(R) and L
p
(I) are Banach spaces for any p ≥ 1 and any interval I.
We will not give a complete proof but sketch the main ideas instead.
Let
1
p
+
1
q
= 1 and f ∈ L
p
(R), g ∈ L
q
(R). Then the H¨older inequality states
that
Z
| f g| ≤ k f k
p
kgk
q
.
Note that the characteristic function χ
I
∈ L
q
(R) for any interval I and any q ≥ 1, more-
over kχ
I
k
L
q
= |I|
1
q
where |I| = b − a is the length of I. The H¨older inequality with
g
= χ
I
implies that
Z
χ
I
| f | =
Z
I
| f | ≤ |I|
1/q
k f k
p
.
The left hand side of this inequality is the norm of f in L
1
(I):
k f k
L
1
(I)
≤ |I|
1/q
k f k
L
p
(I)
.
Consequently any Cauchy sequence in L
p
(I) is automatically a Cauchy sequence in
L
1
(I). Since L
1
is complete the Cauchy sequence converges to a limit in L
1
(I). In
order to proof completeness of L
p
it is sufficient to show that the p
th
power of this limit
is integrable. This can be done on the basis of the Dominated Convergence Theorem.
Exercise: The next two exercises show that L
2
(R) is complete (compare with the proof
of completeness for L
1
(R)).
1. Let ( f
k
)
∞
k
=1
be a sequence in L
2
(R) such that
∞
∑
k
=1
k f
k
k
L
2
< ∞ .
Applying the MCT to the sequence
g
n
=
n
∑
k
=1
| f
k
|
!
2
show that ∑
k
f
k
converges to a function f with integrable f
2
.
2. Now use the DCT applied to h
n
=
f
− ∑
n
k
=1
f
k
2
to deduce that ∑
k
f
k
converges
in the L
2
norm to a function in L
2
.
7
A proof of this inequality is similar to the proof of Lemma 2.3 provided we take for granted that a
product of two measurable functions is measurable. We will not discuss this proof further.
35

6
Hilbert spaces
6.1
Inner product spaces
You have already seen the inner product on R
n
.
Definition 6.1 An inner product on a vector space V is a map (·, ·) : V ×V → K such
that for all x
, y, z ∈ V and for all λ ∈ K:
(i)
(x, x) ≥ 0, and (x, x) = 0 iff x = 0;
(ii)
(x + y, z) = (x, z) + (y, z);
(iii)
(λ x, y) = λ (x, y);
(iv)
(x, y) = (y, x).
A vector space equipped with an inner product is called
an inner product space.
• In a real vector space the complex conjugate in (iv) is not necessary.
• If K = C, then (iv) with y = x implies that (x, x) is real and therefore the require-
ment (x, x) ≥ 0 makes sense.
• (iii) and (iv) imply that (x, λ y) = λ (x.y).
1. Example: R
n
is an inner product space
(x, y) =
n
∑
k
=1
x
k
y
k
.
2. Example: C
n
is an inner product space
(x, y) =
n
∑
k
=1
x
k
y
k
.
3. Example: `
2
(K) is an inner product space
(x, y) =
∞
∑
k
=1
x
k
y
k
.
Note that the sum converges because ∑
k
|x
k
y
k
| ≤
1
2
∑
k
|x
k
|
2
+ |y
k
|
2
.
4. Example: L
2
(a, b) is an inner product space
( f , g) =
Z
b
a
f
(x)g(x) dx .
36

6.2
Natural norms
Every inner product space is a normed space as well.
Proposition 6.2 If V is an inner product space, then
kvk =
p
(v, v)
defines a norm on V .
Definition 6.3 We say that kxk =
p(x, x) is the natural norm induced by the inner
product.
The proof of the proposition uses the following inequality.
Lemma 6.4 (Cauchy-Schwartz inequality) If V is an inner product space and kvk =
p(v, v) for all v ∈ V , then
|(x, y)| ≤ kxk kyk
for all x
, y ∈ V .
Proof of the lemma:
The inequality is obvious if y = 0. So suppose that y 6= 0. Then
for any λ ∈ K:
0 ≤ (x − λ y, x − λ y) = (x, x) − λ (y, x) − λ (x, y) + |λ |
2
(y, y) .
Then substitute λ = (x, y)/kyk
2
:
0 ≤ (x, x) − 2
|(x, y)|
kyk
2
+
|(x, y)|
kyk
2
= kxk
2
−
|(x, y)|
kyk
2
,
which implies the desired inequality.
Proof of Proposition 6.2:
We note that positive definiteness and homogeneity of k · k
easily follow from (i), and (iii), (iv) in the definition of the inner product. In order to
establish the triangle inequality we use the Cauchy-Schwartz inequality. Let x, y ∈ V .
Then
kx + yk
2
= (x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y)
≤ kxk
2
+ 2kxk kyk + kyk
2
= (kxk + kyk)
2
,
and the triangle inequality follows by taking the square root.
Therefore k · k is a norm.
We have already proved the Cauchy-Schwartz inequality for `
2
(K) using a different
strategy (see Lemma 2.4).
The Cauchy-Schwartz inequality in L
2
(a, b) takes the form
Z
b
a
f
(x)g(x) dx
≤
Z
b
a
| f (x)|
2
dx
1/2
Z
b
a
|g(x)|
2
dx
1/2
.
In particular, it states that f , g ∈ L
2
(a, b) implies f g ∈ L
1
(a, b).
37

Lemma 6.5 If V is an inner product space equipped with the natural norm, then x
n
→
x and y
n
→ y imply that
(x
n
, y
n
) → (x, y) .
Proof:
Since any convergent sequence is bounded, the inequality
|(x
n
, y
n
) − (x, y)| = |(x
n
− x, y
n
) + (x, y
n
− y)|
≤ |(x
n
− x, y
n
)| + |(x, y
n
− y)|
≤ kx
n
− xk ky
n
k + kxk ky
n
− yk
implies that (x
n
, y
n
) → (x, y).
The lemma implies that we can swap inner products and limits.
6.3
Parallelogram law and polarisation identity
Natural norms have some special properties.
Lemma 6.6 (Parallelogram law) If V is an inner product space with the natural norm
k · k, then
kx + yk
2
+ kx − yk
2
= 2 kxk
2
+ kyk
2
for all x
, y ∈ V .
Proof:
The linearity of the inner product implies that for any x, y ∈ V
kx + yk
2
+ kx − yk
2
= (x + y, x + y) + (x − y, x − y)
= (x, x) + (x, y) + (y, x) + (y, y)
+(x, x) − (x, y) − (y, x) + (y, y)
= 2 kxk
2
+ kyk
2
Example (some norms are not induced by an inner product): There is no inner
product which could induce the following norms on C[0, 1]:
k f k
∞
= sup
t
∈[0,1]
| f (t)|
or
k f k
L
1
=
Z
1
0
| f (t)| dt .
Indeed, these norms do not satisfy the parallelogram law, e.g., take f (x) = x and g(x) =
1 − x, obviously f , g ∈ C[0, 1] and
k f k
∞
= kgk
∞
= k f − gk
∞
= k f + gk
∞
= 1 ,
substituting these numbers into the parallelogram law we see 2 6= 4.
Exercise: Is the parallelogram law for the L
1
norm satisfied for these f , g?
38

Lemma 6.7 (Polarisation identity) Let V be an inner product space with the natural
norm
k · k. Then
1. If V is real
4(x, y) = kx + yk
2
− kx − yk
2
;
2. If V is complex
4(x, y) = kx + yk
2
− kx − yk
2
+ ikx + iyk
2
− ikx − iyk
2
.
Proof:
Plug in the definition of the natural norm into the right hand side and use
linearity of the inner product.
Lemma 6.7 shows that the inner product can be restored from its natural norm. Al-
though the right hand sides of the polarisation identities are meaningful for any norm,
we should not rush to the conclusion that any normed space is automatically an inner
product space. Indeed, the example above implies that for some norms these formulae
cannot define an inner product. Nevertheless, if the norm satisfy the parallelogram law,
we indeed get an inner product:
Proposition 6.8 Let V be a real normed space with the norm k · k satisfying the par-
allelogram law, then
(x, y) =
kx + yk
2
− kx − yk
2
4
=
kx + yk
2
− kxk
2
− kyk
2
2
defines an inner product on V .
Proof:
Let us check that (x, y) satisfy the axioms of inner product. Positivity and
symmetry are straightforward (Exercise). The linearity:
4(x, y) + 4(z, y) = kx + yk
2
− kx − yk
2
+ kz + yk
2
− kz − yk
2
=
1
2
(kx + 2y + zk
2
+ kx − zk
2
) −
1
2
(kx − 2y + zk
2
+ kx − zk
2
)
=
1
2
kx + 2y + zk
2
−
1
2
kx − 2y + zk
2
=
1
2
(2kx + y + zk
2
+ 2kyk
2
− kx + zk
2
)
−
1
2
(2kx − y + zk
2
+ 2kyk
2
− kx + zk
2
)
= kx + y + zk
2
− kx − y + zk
2
= 4(x + z, y) .
We have proved that
(x, y) + (z, y) = (x + z, y) .
Applying this identity several times and setting z = x/m we obtain
n
(x/m, y) = (nx/m, y)
and
m
(x/m, y) = (x, y)
8
Can you find a simpler proof?
39

for any m ∈ Z and n ∈ N. Consequently, for any rational λ =
n
m
(λ x, y) = λ (x, y) .
We note that the right hand side of the definition involves the norms only, which com-
mute with the limits. Any real number is a limit of rational numbers and therefore the
linearity holds for all λ ∈ R.
6.4
Hilbert spaces: Definition and examples
Definition 6.9 A Hilbert space is a complete inner product space (equipped with the
natural norm).
Of course, any Hilbert space is a Banach space.
1. Example: R
n
is a Hilbert space
(x, y) =
n
∑
k
=1
x
k
y
k
,
kxk =
n
∑
k
=1
x
2
k
!
1/2
.
2. Example: C
n
is a Hilbert space
(x, y) =
n
∑
k
=1
x
k
y
k
,
kxk =
n
∑
k
=1
|x
k
|
2
!
1/2
.
3. Example: `
2
(K) is a Hilbert space
(x, y) =
∞
∑
k
=1
x
k
y
k
,
kxk =
∞
∑
k
=1
|x
k
|
2
!
1/2
.
4. Example: L
2
(a, b) is a Hilbert space
( f , g) =
Z
b
a
f
(x)g(x) dx,
kxk =
Z
b
a
| f (x)|
2
dx
1/2
.
40

7
Orthonormal bases in Hilbert spaces
The goal of this section is to discuss properties of orthonormal bases in a Hilbert space
H
. Unlike Hamel bases, the orthonormal ones involve a countable number of elements:
i.e. a vector x is represented in the form of an infinite sum
x
=
∞
∑
k
=1
α
k
e
k
for some α
k
∈ K.
We will mainly consider complex spaces with K = C. The real case K = R is not
very different. We will use (·, ·) to denote an inner product on H, and k · k will stand
for the natural norm induced by the inner product.
7.1
Orthonormal sets
Definition 7.1 Two vectors x, y ∈ H are called orthogonal if (x, y) = 0. Then we write
x
⊥ y.
Theorem 7.2 (Pythagoras theorem) If x ⊥ y then kx + yk
2
= kxk
2
+ kyk
2
.
Proof:
Since (x, y) = 0
kx + yk
2
= (x + y, x + y) = (x, x) + (x, y) + (y, x) + (y, y) = kxk
2
+ kyk
2
.
Definition 7.3 A set E is orthonormal if kek = 1 for all e ∈ E and (e
1
, e
2
) = 0 for all
e
1
, e
2
∈ E such that e
1
6= e
2
.
Note that this definition does not require the set E to be countable.
Exercise: Any orthonormal set is linearly independent.
Indeed, suppose ∑
n
k
=1
α
k
e
k
= 0 with e
k
∈ E and α
k
∈ K. Multiplying this equality
by e
j
we get
0 =
n
∑
k
=1
α
k
e
k
, e
j
!
=
n
∑
k
=1
α
k
(e
k
, e
j
) = α
j
.
Since α
j
= 0 for all j, we conclude that the set E is linearly independent.
Definition 7.4 (Kronecker delta) The Kronecker delta is the function defined by
δ
jk
=
1, if j = k,
0, if j 6= k .
41

Example: For every j ∈ N, let e
j
= (δ
jk
)
∞
k
=1
(it is an infinite sequence of zeros with 1
at the j
th
position). The set E = { e
j
: j ∈ N } is orthonormal in `
2
. Indeed, from the
definition of the scalar product in `
2
we see that (e
j
, e
k
) = δ
jk
for all j, k ∈ N.
Example: The set
E
=
f
k
=
e
ikx
√
2π
: k ∈ Z
is an orthonormal set in L
2
(−π, π). Indeed, since | f
k
(x)| =
1
√
2π
for all x:
k f
k
k
2
L
2
=
Z
π
−π
| f
k
(x)|
2
dx
= 1 ,
and if j 6= k
( f
k
, f
j
) =
Z
π
−π
f
k
(x) f
j
(x) dx =
1
2π
Z
π
−π
e
i
(k− j)x
dx
=
e
i
(k− j)x
i
(k − j)
x
=π
x
=−π
= 0 .
Lemma 7.5 If {e
1
, . . . , e
n
} is an orthonormal set in an inner product space V , then for
any α
j
∈ K
n
∑
j
=1
α
j
e
j
2
=
n
∑
j
=1
|α
j
|
2
.
Proof:
The following computation is straightforward:
n
∑
j
=1
α
j
e
j
2
=
n
∑
j
=1
α
j
e
j
,
n
∑
l
=1
α
l
e
l
!
=
n
∑
j
=1
n
∑
l
=1
α
j
α
l
(e
j
, e
l
)
=
n
∑
j
=1
n
∑
l
=1
α
j
α
l
δ
jl
=
n
∑
j
=1
α
j
α
j
.
7.2
Gram-Schmidt orthonormalisation
Lemma 7.6 (Gram-Schmidt orthonormalisation) Let V be an inner product space
and
(v
k
) be a sequence (finite or infinite) of linearly independent vectors in V . Then
there is an orthonormal sequence
(e
k
) such that
Span{ v
1
, . . . , v
k
} = Span{ e
1
, . . . , e
k
}
for all k.
9
Remember that for any x ∈ R and any k ∈ Z: e
ikx
= cos kx + i sin kx. Then |e
ikx
| = 1 and e
±ikπ
=
cos kπ ± i sin kπ = (−1)
k
.
42

Proof:
Let e
1
=
v
1
kv
1
k
. Then
Span{ v
1
} = Span{ e
1
}
and the statement is true for n = 1 as the set E
1
= { e
1
} is obviously orthonormal.
Then we continue inductively.
Suppose that for some k ≥ 2 we have found an
orthonormal set E
k
−1
= { e
1
, . . . , e
k
−1
} such that its span coincides with the span of
{ v
1
, . . . , v
k
−1
}. Then set
˜
e
k
= v
k
−
k
−1
∑
j
=1
(v
k
, e
j
)e
j
.
Since ∑
k
−1
j
=1
(v
k
, e
j
)e
j
∈ Span(E
k
−1
) = Span{ v
1
, . . . , v
k
−1
} and v
1
, . . . , v
k
are linearly
independent, we conclude that ˜
e
k
6= 0. For every j < k
( ˜
e
k
, e
j
) = (v
k
, e
l
) −
k
−1
∑
j
=1
(v
k
, e
j
)(e
j
, e
l
) = (v
k
, e
l
) − (v
k
, e
l
) = 0
which implies that ˜
e
k
⊥ e
j
. Finally let e
k
= ˜
e
k
/k ˜
e
k
k. Then { e
1
, . . . , e
k
} is an orthonor-
mal set such that
Span{ e
1
, . . . , e
k
} = Span{ v
1
, . . . , v
k
} .
If the original sequence is finite, the orthonormalisation procedure will stop after a
finite number of steps. Otherwise, we get an infinite sequence of e
k
.
Corollary 7.7 Any infinite-dimensional inner product space contains a countable or-
thonormal sequence.
Corollary 7.8 Any finite-dimensional inner product space has an orthonormal basis.
Proposition 7.9 Any finite dimensional inner product space is isometric to C
n
(or R
n
if the space is real) equipped with the standard inner product.
Proof:
Let n = dimV and e
j
, j = 1, . . . , n be an orthonormal basis in V . Note that
(e
k
, e
j
) = δ
k j
. Any two vectors x, y ∈ V can be written as
x
=
n
∑
k
=1
x
k
e
k
and
y
=
n
∑
j
=1
y
j
e
j
.
Then
(x, y) =
n
∑
k
=1
x
k
e
k
,
n
∑
j
=1
y
j
e
j
!
=
n
∑
k
=1
n
∑
j
=1
x
k
y
j
(e
k
, e
j
) =
n
∑
k
=1
x
k
y
k
.
Therefore the map x 7→ (x
1
, . . . , x
n
) is an isometry.
We see that an arbitrary inner product, when written in orthonormal coordinates,
takes the form of the “canonical” inner product on C
n
(or R
n
if the original space is
real).
10
For example, let k = 2. We define ˜
e
2
= v
2
− (v
2
, e
1
)e
1
. Then ( ˜
e
2
, e
1
) = (v
2
, e
1
) − (v
2
, e
1
)(e
1
, e
1
) = 0.
Since v
1
, v
2
are linearly independent ˜
e
2
6= 0. So we can define e
2
=
˜
e
2
k ˜e
2
k
.
43

7.3
Bessel’s inequality
Lemma 7.10 (Bessel’s inequality) If V is an inner product space and E = (e
k
)
∞
k
=1
is
an orthonormal sequence, then for every x
∈ V
∞
∑
k
=1
|(x, e
k
)|
2
≤ kxk
2
.
Proof:
We note that for any n ∈ N:
x
−
n
∑
k
=1
(x, e
k
)e
k
2
=
x
−
n
∑
k
=1
(x, e
k
)e
k
, x −
n
∑
k
=1
(x, e
k
)e
k
!
= kxk
2
− 2
n
∑
k
=1
|(x, e
k
)|
2
+
n
∑
k
=1
|(x, e
k
)|
2
= kxk
2
−
n
∑
k
=1
|(x, e
k
)|
2
.
Since the left hand side is not negative,
n
∑
k
=1
|(x, e
k
)|
2
≤ kxk
2
and the lemma follows by taking the limit as n → ∞.
Corollary 7.11 If E is an orthonormal set in an inner product space V , then for any
x
∈ V the set
E
x
= { e ∈ E : (x, e) 6= 0 }
is at most countable.
Proof:
For any m ∈ N the set E
m
= { e : |(x, e)| >
1
m
} has a finite number of elements.
Otherwise there would be an infinite sequence (e
k
)
∞
k
=1
with e
k
∈ E
m
, then the series
∑
∞
k
=1
|(x, e
k
)|
2
= +∞ which contradicts to Bessel’s inequality. Therefore
E
x
= ∪
∞
m
=1
E
m
is a countable union of finite sets and hence at most countable.
7.4
Convergence
In this section we will discuss convergence of series which involve elements from an
orthonormal set.
Lemma 7.12 Let H be a Hilbert space and E = (e
k
)
∞
k
=1
an orthonormal sequence.
The series ∑
∞
k
=1
α
k
e
k
converges iff ∑
∞
k
=1
|α
k
|
2
< +∞. Then
∞
∑
k
=1
α
k
e
k
2
=
∞
∑
k
=1
|α
k
|
2
.
(7.1)
44

Proof:
Let x
n
= ∑
n
k
=1
α
k
e
k
and β
n
= ∑
n
k
=1
|α
k
|
2
. Lemma 7.5 implies that kx
n
k
2
= β
n
and that for any n > m
kx
n
− x
m
k
2
=
n
∑
k
=m+1
α
k
e
k
2
=
n
∑
k
=m+1
|α
k
|
2
= β
n
− β
m
.
Consequently, x
n
is a Cauchy sequence in H iff β
n
is Cauchy in R. Since both spaces
are complete, the sequences converge or diverge simultaneously.
If they converge, we take the limit as n → ∞ in the equality kx
n
k
2
= β
n
to get (7.1)
(the limit commutes with k · k
2
).
Definition 7.13 A series ∑
∞
n
=1
x
n
in a Banach space X is unconditionally convergent
if for every permutation σ : N → N the series ∑
∞
n
=1
x
σ (n)
converges.
In R
n
a series is unconditionally convergent if and only if it is absolutely con-
vergent. Every absolutely convergent series is unconditionally convergent, but the
converse implication does not hold in general.
Example: Let (e
k
) be an orthonormal sequence. Then
∞
∑
k
=1
1
k
e
k
converges unconditionally but not absolutely.
The sum of an unconditionally convergent sequence is independent from the order
of summation.
Lemma 7.12 and Bessel’s inequality imply:
Corollary 7.14 If H is a Hilbert space and E = (e
k
)
∞
k
=1
is an orthonormal sequence,
then for every x
∈ H the sequence
∞
∑
k
=1
(x, e
k
)e
k
converges unconditionally.
Lemma 7.15 Let H be a Hilbert space, E = (e
k
)
∞
k
=1
an orthonormal sequence and
x
∈ H. If x = ∑
∞
k
=1
α
k
e
k
, then
α
k
= (x, e
k
)
for all k
∈ N.
Proof:
Exercise.
11
Prove it
45

7.5
Orthonormal basis in a Hilbert space
Definition 7.16 A set E is a basis for H if every x ∈ H can be written uniquely in the
form
x
=
∞
∑
k
=1
α
k
e
k
for some α
k
∈ K and e
k
∈ E. If additionally E is an orthonormal set, then E is an
orthonormal basis.
If E is a basis, then it is a linearly independent set. Indeed, if ∑
n
k
α
k
e
k
= 0 then
α
k
= 0 due to the uniqueness.
Note that in this definition the uniqueness is a delicate point. Indeed, the sum
∑
∞
k
=1
α
k
e
k
is defined as a limit of partial sums x
n
= ∑
n
k
=1
α
k
e
k
. A permutation of e
k
changes the partial sums and may lead to a different limit. In general, we cannot even
guarantee that after a permutation the series remains convergent.
If E is countable, we can assume that the sum involves all elements of the basis
(some α
k
can be zero) and that the summation is taken following the order of a selected
enumeration of E. The situation is more difficult if E is uncountable since in this case
there is no natural way of numbering the elements.
The situation is much simpler if E is orthonormal as in this case the series converge
unconditionally and the order of summations is not important.
Proposition 7.17 Let E = { e
j
: j ∈ N } be an orthonormal set in a Hilbert space H.
Then the following statements are equivalent:
(a) E is a basis in H;
(b) x
= ∑
∞
k
=1
(x, e
k
)e
k
for all x
∈ H;
(c)
kxk
2
= ∑
∞
k
=1
|(x, e
k
)|
2
for all x
∈ H;
(d)
(x, e
n
) = 0 for all n ∈ N implies x = 0;
(e) the linear span of E is dense in H.
Proof:
(b) =⇒ (a): Take any x ∈ H and let α
k
= (x, e
k
). Then x = ∑
∞
k
=1
α
k
e
k
. In order to
check uniqueness of the coefficients we suppose that x = ∑
∞
k
=1
˜
α
k
e
k
. Then
α
j
= (x, e
j
) = (
∞
∑
k
=1
˜
α
k
e
k
, e
j
) =
∞
∑
k
=1
˜
α
k
(e
k
, e
j
) = ˜
α
j
,
i.e. the coefficients are unique. Therefore E is a basis.
(b) =⇒ (c): use Lemma 7.12.
46

(c) =⇒ (d): Let (x, e
k
) = 0 for all k, then (c) implies that kxk = 0 hence x = 0.
(d) =⇒ (b): let y = x − ∑
∞
k
=1
(x, e
k
)e
k
. Corollary 7.14 implies that the series con-
verges. Then Lemma 6.5 implies we can swap the limit and the inner product to get
for every n
(y, e
n
) =
x
−
∞
∑
k
=1
(x, e
k
)e
k
, e
n
!
= (x, e
n
) −
∞
∑
k
=1
(x, e
k
)(e
k
, e
n
) = (x, e
n
) − (x, e
n
) = 0 .
Since (y, e
n
) = 0 for all n, then (d) implies that y = 0 which is equivalent to x =
∑
∞
k
=1
(x, e
k
)e
k
as required.
(e) =⇒ (d): since Span(E) is dense in H for any x ∈ H there is a sequence x
n
∈
Span(E) such that x
n
→ x. Take x such that (x, e
n
) = 0 for all n. Then (x
n
, x) = 0 and
consequently
kxk
2
=
lim
n
→∞
x
n
, x
= lim
n
→∞
(x
n
, x) = 0 .
Therefore x = 0.
(a) =⇒ (e): Since E is a basis any x = lim
n
→∞
x
n
with x
n
= ∑
n
k
=1
α
k
e
k
∈ Span(E).
Example: The orthonormal sets from examples of Section 7.1 are also examples of
orthonormal bases.
7.6
Separable Hilbert spaces
Definition 7.18 A normed space is separable if it contains a countable dense subset.
In other words, a space H is separable if there is a countable set { x
n
∈ H : n ∈ N }
such that for any u ∈ H and any ε > 0 there is n ∈ N such that
kx
n
− uk < ε .
Examples: R is separable (Q is dense). R
n
is separable (Q
n
is dense), C
n
is separable
(Q
n
+ iQ
n
is dense).
Example: `
2
is separable. Indeed, the set of sequences (x
1
, x
2
, . . . , x
n
, 0, 0, 0, . . .) with
x
j
∈ Q is dense and countable.
Example: The space C[0, 1] is separable. Indeed, the Weierstrass approximation the-
orem states that every continuous function can be approximated (in the sup norm) by
a polynomial. The dense countable set is given by polynomials with rational coeffi-
cients.
47

Example: L
2
(0, 1) is separable. Indeed, continuous functions are dense in L
2
(0, 1)
(in the L
2
-norm). The polynomials are dense in C[0, 1] (in the supremum norm and
therefore in the L
2
norm as well). The set of polynomials with rational coefficients is
dense in the set of all polynomials and, consequently, it is also dense in L
2
[0, 1] (in the
L
2
norm).
Proposition 7.19 An infinite-dimensional Hilbert space is separable iff it has a count-
able orthonormal basis.
Proof:
If a Hilbert space has a countable basis, then we can construct a countable dense
set by taking finite linear combinations of the basis elements with rational coefficients.
Therefore the space is separable.
If H is separable, then it contains a countable dense subset V = {x
n
: n ∈ N}.
Obviously, the closed linear span of V coincides with H. First we construct a linear
independent set ˜
V
which has the same linear span as V by eliminating from V those
x
n
which are not linearly independent from { x
1
, . . . , x
n
−1
}. Then the Gram-Schmidt
process gives an orthonormal sequence with the same linear span, i.e., it is a basis by
characterisation (e) of Proposition 7.17.
The following theorem shows that all infinite dimensional separable Hilbert spaces
are isometric to `
2
. In this sense, `
2
is the “only” separable infinite-dimensional space.
Theorem 7.20 Any infinite-dimensional separable Hilbert space is isometric to `
2
.
Proof:
Let { e
j
: j ∈ N } be an orthonormal basis in H. The map A : H → `
2
defined by
A
: u → ((u, e
1
), (u, e
2
), (u, e
3
), . . . }
is invertible. Indeed, the image of A is in `
2
due to Lemma 7.12, and the inverse map
is given by
A
−1
: (x
k
)
∞
k
=1
7→
∞
∑
k
=1
x
k
e
k
.
The characterisation of a basis in Proposition 7.17 implies that kuk
H
= kA(u)k
`
2
.
Note that there are Hilbert spaces which are not separable.
Example: Let
J be an uncountable set. The space H of all functions f : J → R
such that
k f k
2
:=
∑
j
∈
J
| f ( j)|
2
< ∞
48

is a Hilbert space.
It is not separable. Indeed, let χ
k
( j) = δ
k j
, where δ
k j
is the
Kronecker delta. The set { χ
k
: k ∈
J } ⊂ H is not countable and kχ
k
− χ
k
0
k = δ
kk
0
√
2.
Consequently, if k 6= k
0
, B(χ
k
,
1
2
) ∩ B(χ
k
,
1
2
) = /0. So we have found an uncountable
number of nonintersecting balls of radius
1
2
. This obviously contradicts to existence of
a countable dense set.
12
How do we define the sum over an uncountable set? For any n ∈ N the set J
n
= { j ∈
J : | f ( j)| >
1
n
} is finite (otherwise the sum is obviously infinite). Consequently, the set
J ( f ) := { j ∈ J : | f ( j)| >
0 } is countable because it is a countable union of finite sets:
J ( f ) = ∪
∞
n
=1
J
n
. Therefore, the number
of non-zero terms in the sum is countable and the usual definition of an infinite sum can be used.
49

8
Closest points and approximations
8.1
Closest points in convex subsets
Definition 8.1 A subset A of a vector space V is convex if λ x + (1 − λ )y ∈ A for any
two vectors x
, y ∈ V and any λ ∈ [0, 1].
Lemma 8.2 If A is a non-empty closed convex subset of a Hilbert space H, then for
any x
∈ H there is a unique a
∗
∈ A such that
kx − a
∗
k = inf
a
∈A
kx − ak .
Proof:
The parallelogram rule implies:
k(x − u) + (x − v)k
2
+ k(x − u) − (x − v)k
2
= 2kx − uk
2
+ 2kx − vk
2
.
Then
ku − vk
2
= 2kx − uk
2
+ 2kx − vk
2
− 4kx −
1
2
(u + v)k
2
.
Let d = inf
a
∈A
kx − ak. Since A is convex,
1
2
(u + v) ∈ A for any u, v ∈ A, and conse-
quently kx −
1
2
(u + v)k ≥ d. Then
ku − vk
2
≤ 2kx − uk
2
+ 2kx − vk
2
− 4d
2
.
(8.1)
Since d is the infinum, for any n there is a
n
∈ A such that kx − a
n
k
2
< d
2
+
1
n
. Then
equation (8.1) implies that
ka
n
− a
m
k ≤ 2d
2
+
2
n
+ 2d
2
+
2
m
− 4d
2
=
2
n
+
2
m
.
Consequently (a
n
) is Cauchy and, since H is complete, it converges to some a
∗
. Since
A
is closed, a
∗
∈ A. Then
kx − a
∗
k
2
= lim
n
→∞
kx − a
n
k
2
= d
2
.
Therefore a
∗
is the point closest to x. Now suppose that there is another point ˜
a
∈ A
such that kx − ˜
a
k = d, then (8.1) implies
ka
∗
− ˜
a
k ≤ 2kx − a
∗
k
2
+ 2kx − ˜
a
k
2
− 4d
2
= 2d
2
+ 2d
2
− 4d
2
= 0 .
So ˜
a
= a
∗
and a
∗
is unique.
50

8.2
Orthogonal complements
Definition 8.3 Let X ⊆ H. The orthogonal complement of X in H is the set
X
⊥
= {u ∈ H : (u, x) = 0 for all x ∈ X } .
In an infinite dimensional space a linear subspace does not need to be closed. For
example the space `
f
of all sequences with only a finite number of non-zero elements
is a linear subspace of `
2
but it is not closed in `
2
(e.g. consider the sequence x
n
=
(1, 2
−1
, 2
−2
, . . . , 2
−n
, 0, 0, . . .)).
Proposition 8.4 If X ⊆ H, then X
⊥
is a closed linear subspace of H.
Proof:
If u, v ∈ X
⊥
and α ∈ K then
(u + αv, x) = (u, x) + α(v, x) = 0
for all x ∈ X . Therefore X
⊥
is a linear subspace. Now suppose that u
n
∈ X
⊥
and
u
n
→ u ∈ H. Then for all x ∈ X
(u, x) = ( lim
n
→∞
u
n
, x) = lim
n
→∞
(u
n
, x) = 0 .
Consequently, u ∈ X
⊥
and so X
⊥
is closed.
Exercises:
1. If E is a basis in H, then E
⊥
= { 0 }.
2. If Y ⊆ X , then X
⊥
⊆ Y
⊥
.
3. X ⊆ (X
⊥
)
⊥
4. If X is a closed linear subspace in H, then X = (X
⊥
)
⊥
Definition 8.5 The closed linear span of E ⊂ H is a minimal closed set which contains
Span(E):
Span(E) = { u ∈ H : ∀ε > 0 ∃x ∈ Span(E) such that kx − uk < ε } .
Proposition 8.6 If E ⊂ H then E
⊥
= (Span(E))
⊥
= (Span(E))
⊥
.
Proof:
Since E ⊆ Span(E) ⊆ Span(E) we have (Span(E))
⊥
⊆ (Span(E))
⊥
⊆ E
⊥
. So
we need to prove the inverse inclusion. Take u ∈ E
⊥
and x ∈ Span(E). Then there is
x
n
∈ Span(E) such that x
n
→ x. Then
(x, u) = ( lim
n
→∞
x
n
, u) = lim
n
→∞
(x
n
, u) = 0 .
Consequently, u ∈ (Span(E))
⊥
and we proved E
⊥
⊆ (Span(E))
⊥
.
51

Theorem 8.7 If U is a closed linear subspace of a Hilbert space H then
1. any x
∈ H can be written uniquely in the form x = u + v with u ∈ U and v ∈ U
⊥
.
2. u is the closest point to x in U .
3. The map P
U
: H → U defined by P
U
x
= u is linear and satisfies
P
2
U
x
= P
U
x
and
kP
U
(x)k ≤ kxk
for all x
∈ H .
Definition 8.8 The map P
U
is called
the orthogonal projector onto U .
Proof:
Any linear subspace is obviously convex. Then Lemma 8.2 implies that there
is a unique u ∈ U such that
kx − uk = inf
a
∈U
kx − ak .
Let v = x − u. Let us show that v ∈ U
⊥
. Indeed, take any y ∈ U and consider the
function ∆ : C → R defined by
∆(t) = kv + tyk
2
= kx − (u − ty)k
2
.
Since the definition of u together with u − ty ∈ U imply that ∆(t) ≥ ∆(0) = kx − uk
2
,
the function ∆ has a minimum at t = 0. On the other hand
∆(t) = kv + tyk
2
= (v + ty, v + ty)
= (v, v) + t(y, v) + ¯t(v, y) + |t|
2
(y, y) .
First suppose that t is real. Then ¯t = t and
d∆
dt
(0) = 0 implies
(y, v) + (v, y) = 0 .
Then suppose that t is purely imaginary, Then ¯t = −t and
d∆
dt
(0) = 0 implies
(y, v) − (v, y) = 0 .
Taking the sum of these two equalities we conclude
(y, v) = 0
for every y ∈ U .
Therefore v ∈ U
⊥
.
In order to prove the uniqueness of the representation suppose x = u
1
+ v
1
= u + v
with u
1
, u ∈ U and v
1
, v ∈ U
⊥
. Then u
1
− u = v − v
1
. Since u − u
1
∈ U and v − v
1
∈ U
⊥
,
kv − v
1
k
2
= (v − v
1
, v − v
1
) = (v − v
1
, u
1
− u) = 0 .
Therefore u and v are unique.
Finally x = u + v with u ⊥ v implies kxk
2
= kuk
2
+ kvk
2
. Consequently kP
U
(x)k =
kuk ≤ kxk. We also note that P
U
(u) = u for any u ∈ U . So P
2
U
(x) = P
U
(x) as P
U
(x) ∈ U
Corollary 8.9 If U is a closed linear subspace in a Hilbert space H and x ∈ H, then
P
U
(x) is the closest point to x in U .
52

8.3
Best approximations
Theorem 8.10 Let E be an orthonormal sequence: E = { e
j
: j ∈
J } where J is
either finite or countable set. Then for any x
∈ H, the closest point to x in Span(E) is
given by
y
=
∑
j
∈
J
(x, e
j
)e
j
.
Proof:
Corollary 7.14 implies that u = ∑
j
∈
J
(x, e
j
)e
j
converges. Then obviously u ∈
Span(E) which is a closed linear subset. Let v = x −u. Since (v, e
k
) = (x, e
k
) − (u, e
k
) =
0 for all k ∈ J, we conclude v ∈ E
⊥
= (Span(E))
⊥
(Lemma 8.6). Theorem 8.7 implies
that u is the closest point.
Corollary 8.11 If E is an orthonormal basis in a closed subspace U ⊂ H, then the
orthogonal projection onto U is given by
P
U
(x) =
∑
j
∈
J
(x, e
j
)e
j
.
Example: The best approximation of an element x ∈ `
2
in terms of the elements of the
standard basis (e
j
)
n
j
=1
is given by
n
∑
k
=1
(x, e
j
)e
j
= (x
1
, x
2
, . . . , x
n
, 0, 0, . . .) .
Example: Let (e
j
)
∞
j
=1
be an orthonormal basis in H. The best approximation of an
element x ∈ H in terms of the first n elements of the orthonormal basis is given by
n
∑
k
=1
(x, e
j
)e
j
.
Now suppose that the set E is not orthonormal. If the set E is finite or countable we
can use the Gram-Schmidt orthonormalisation procedure to construct an orthonormal
basis in Span(E). After that the theorem above gives us an explicit expression for the
best approximation. Let’s consider some examples.
Example: Find the best approximation of a function f ∈ L
2
(−1, 1) with polynomials
of degree up to n. In other words, let E = { 1, x, x
2
, . . . , x
n
}. We need to find u ∈
Span(E) such that
k f − uk
L
2
=
inf
p
∈Span(E)
k f − pk
L
2
.
The set E is not orthonormal. Let’s apply the Gram-Schmidt orthonormalisation pro-
cedure to construct an orthonormal basis in Span(E). For the sake of shortness, let’s
write k · k = k · k
L
2
(−1,1)
.
53

First note that k1k =
√
2 and let
e
1
=
1
√
2
.
Then (1, x) =
R
1
−1
x dx
= 0 and kxk
2
=
R
1
−1
|x|
2
dx
=
2
3
so let
e
2
=
r
3
2
x
.
Then
˜
e
3
= x
2
− (x
2
, e
2
)e
2
− (x
2
, e
1
)e
1
= x
2
−
r
3
2
x
Z
1
−1
t
2
r
3
2
t dt
−
1
√
2
Z
1
−1
t
2
1
√
2
dt
= x
2
−
1
2
Z
1
−1
t
2
dt
= x
2
−
1
3
.
Taking into account that
k ˜e
3
k
2
=
Z
1
−1
x
2
−
1
3
2
dt
=
8
45
we obtain
e
3
=
˜
e
3
k ˜e
3
k
=
r
5
8
(3x
2
− 1) .
Exercise:
Show that e
4
=
q
7
8
(5x
3
− 3x) is orthogonal to e
1
, e
2
and e
3
.
The best approximation of any function f ∈ L
2
(−1, 1) by a polynomial of third
degree is given by
7
8
(5x
3
− 3x)
Z
1
−1
f
(t)(5t
3
− 3t)dt +
5
8
(3x
2
− 1)
Z
1
−1
f
(t)(3t
2
− 1)dt
+
3
2
x
Z
1
−1
t f
(t) dt +
1
2
Z
1
−1
f
(t) dt
For example, if f (x) = |x| its best approximation by a third degree polynomial is
p
3
=
15x
2
+ 3
16
.
We can check (after computing the corresponding integral);
k f − p
3
k
2
=
3
16
.
54

Note that the best approximation in the L
2
norm is not necessarily the best approx-
imation in the sup norm. Indeed, for example,
sup
x
∈[−1,1]
|x| −
15x
2
+ 3
16
>
3
16
(the supremum is larger than the values at x = 0). At the same time
sup
x
∈[−1,1]
|x| −
x
2
+
1
8
=
1
8
.
55

9
Linear maps between Banach spaces
A linear map on a vector space is traditionally called a linear operator. All linear
functions defined on a finite-dimensional space are continuous. This statement is no
longer true in the case of an infinite dimensional space.
We will begin our study with continuous operators: this class has a rich theory and
numerous applications. We will only slightly touch some of them (the most remarkable
examples will be the shift operators on `
2
, and integral operators and multiplication
operators on L
2
).
Of course many interesting linear maps are not continuous. For example, con-
sider the differential operator A : f 7→ f
0
on the space of continuously differentiable
functions. More accurately, let D(A) = C
1
[0, 1] ⊂ L
2
(0, 1) be the domain of A. Obvi-
ously, A : D(A) → L
2
(0, 1) is linear but not continuous. Indeed, consider the sequence
x
n
(t) = n
−1
sin(nt). Obviously kx
n
k
L
2
≤ n
−1
so x
n
→ 0, but A(x
n
) = cos(nt) does not
converge to A(0) = 0 in the L
2
norm so A is not continuous.
Some definitions and properties from the theory of continuous linear operators can
be literally extended onto unbounded ones, but sometimes subtle differences appear:
e.g., we will see that a bounded operator is self-adjoint iff it is symmetric, which is
no longer true for unbounded operators. In a study of unbounded operators a special
attention should be paid to their domains.
9.1
Continuous linear maps
Let U and V be vector spaces over K.
Definition 9.1 A function A : U → V is called a linear operator if
A
(αx + β y) = αA(x) + β A(y)
for all x
, y ∈ U and α, β ∈ K.
We will often write Ax to denote A(x).
The collection of all linear operators from U to V is a vector space. If A, B : U → V
are linear operators and α, β ∈ K then we define
(αA + β B)(x) = αAx + β Bx .
Obviously, αA + β B is also linear.
Definition 9.2 A linear operator A : U → V is bounded if there is a constant M such
that
kAxk
V
≤ Mkxk
U
for all x
∈ U.
(9.1)
If an operator is bounded, then the image of a bounded set is also bounded. Since
A
(αx) = αA(x) for all α ∈ K, a bounded operator is rarely a bounded function. Rather,
it is a locally bounded function (i.e. every point has a neighbourhood such that the
restriction of A onto the neighbourhood is bounded).
56

Lemma 9.3 A linear operator A : U → V is continuous iff it is bounded.
Proof:
Suppose A is bounded. Then there is M > 0 such that
kA(x) − A(y)k = kA(x − y)k ≤ Mkx − yk
for all x, y ∈ V and consequently A is continuous.
Now suppose A is continuous. Obviously A(0) = 0. Then for ε = 1 there is δ > 0
such that kA(x)k < ε = 1 for all kxk < δ . For any u ∈ U , u 6= 0,
A
(u) =
2kuk
δ
A
δ
2kuk
u
.
Since
δ
2kuk
u
=
δ
2
< δ we get kA(u)k ≤
2kuk
δ
and consequently A is bounded.
The space of all bounded linear operators from U to V is denoted by B(U,V ).
Definition 9.4 The operator norm of A : U → V is
kAk
B
(U,V )
= sup
x
6=0
kA(x)k
V
kxk
U
.
We will often write
kAk
op
instead of
kAk
B
(U,V )
.
Since A is linear
kAk
B
(U,V )
= sup
kxk
U
=1
kA(x)k
V
.
We note that kAk
B
(U,V )
is the smallest M such that (9.1) holds: indeed, it is easy to see
that the definition of operator norm implies
kA(x)k
V
≤ kAk
B
(U,V )
kxk
U
and (9.1) holds with M = kAk
B
(U,V )
. On the other hand, (9.1) implies M ≥
kAxk
V
kxk
U
for
any x 6= 0 and consequently M ≥ kAk
B
(U,V )
.
Theorem 9.5 Let U be a normed space and V be a Banach space. Then B(U,V ) is a
Banach space.
Proof:
Let (A
n
)
∞
n
=1
be a Cauchy sequence in B(U,V ). Take a vector u ∈ U . The
sequence v
n
= A
n
(u) is a Cauchy sequence in V :
kv
n
− v
m
k = kA
n
(u) − A
m
(u)k = k(A
n
− A
m
)(u)k ≤ kA
n
− A
m
k
op
kuk.
Since V is complete there is v ∈ V such that v
n
→ v. Let A(u) = v.
57

The operator A is linear. Indeed,
A
(α
1
u
1
+ α
2
u
2
) =
lim
n
→∞
A
n
(α
1
u
1
+ α
2
u
2
) = lim
n
→∞
(α
1
A
n
(u
1
) + α
2
A
n
(u
2
))
= α
1
lim
n
→∞
A
n
u
1
+ α
2
lim
n
→∞
A
n
u
2
= α
1
Au
1
+ α
2
Au
2
.
The operator A is bounded. Indeed, A
n
is Cauchy and hence bounded: there is constant
M
∈ R such that kA
n
k
op
< M for all n. Taking the limit in the inequality kA
n
u
k ≤ Mkuk
implies kAuk ≤ Mkuk. Therefore A ∈ B(U,V ).
Finally, A
n
→ A in the operator norm. Indeed, Since A
n
is Cauchy, for any ε > 0
there is N such that kA
n
− A
m
k
op
< ε or
kA
n
(u) − A
m
(u)k ≤ εkuk
for all m, n > N.
Taking the limit as m → ∞
kA
n
(u) − A(u)k ≤ εkuk
for all n > N.
Consequently kA
n
− Ak ≤ ε and so A
n
→ A. Therefore B(U,V ) is complete.
9.2
Examples
1. Example: Shift operator: T
l
, T
r
: `
2
→ `
2
:
T
r
(x) = (0, x
1
, x
2
, x
3
, . . .)
and
T
l
(x) = (x
2
, x
3
, x
4
, . . .) .
Both operators are obviously linear. Moreover,
kT
r
(x)k
2
`
2
=
∞
∑
k
=1
|x
k
|
2
= kxk
`
2
.
Consequently, kT
r
k
op
= 1. We also have
kT
l
(x)k
2
`
2
=
∞
∑
k
=2
|x
k
|
2
≤ kxk
`
2
.
Consequently, kT
l
k
op
≤ 1. However, if x = (0, x
2
, x
3
, x
4
, . . .) then kT
l
(x)k
`
2
=
kxk
`
2
. Therefore kT
l
k
op
= 1.
2. Example: Multiplication operator: Let f be a continuous function on [a, b]. The
equation
(Ax)(t) = f (t)x(t)
defines a bounded linear operator A : L
2
[a, b] → L
2
[a, b]. Indeed, A is obviously
linear. It is bounded since
kAxk
2
=
Z
b
a
| f (t)x(t)|
2
dt
≤ k f k
2
∞
Z
b
a
|x(t)| dt = k f k
2
∞
kxk
L
2
.
58

Consequently kAk
op
≤ k f k
∞
. Now let t
0
be a maximum of f . If t
0
6= b, consider
the characteristic function
x
ε
= χ
[t
0
,t
0
+ε]
.
(If t
0
= b let x
ε
= χ
[t
0
−ε,t
0
]
.) Since f is continuous,
kAx
ε
k
kx
ε
k
=
1
ε
Z
t
0
+ε
t
0
| f (t)|
2
dt
→ | f (t)|
2
as ε → 0.
Therefore kAk
op
= k f k
∞
.
3. Example: Integral operator on L
2
(a, b):
(Ax)(t) =
Z
b
a
K
(t, s)x(s) ds
for all t ∈ [a, b],
where
Z
b
a
Z
b
a
|K(s,t)| ds dt < +∞ .
Let us estimate the norm of A:
kAxk
2
=
Z
b
a
Z
b
a
K
(t, s)x(s)ds
2
dt
≤
Z
b
a
Z
b
a
|K(t, s)|
2
ds
Z
b
a
|x(s)|
2
ds
dt
(Cauchy-Schwartz)
=
Z
b
a
Z
b
a
|K(t, s)|
2
dsdt
kxk
2
.
Consequently
kAk
2
op
≤
Z
b
a
Z
b
a
|K(t, s)|
2
dsdt
.
Note that this example requires a bit more from the theory of Lebesgue integrals
than we discussed in Section 5. If you are not taking Measure Theory and feel
uncomfortable with these integrals, you may assume that x, y and K are continu-
ous functions.
9.3
Kernel and range
Definition 9.6 Kernel of A:
Ker A = { x ∈ U : Ax = 0 }
Range of A:
Range A = { y ∈ V : ∃x ∈ U such that y = Ax }
59

We note that 0 ∈ Ker A for any linear operator A. We say that Ker A is trivial if
Ker A = { 0 }.
Proposition 9.7 If A ∈ B(U,V ) then Ker A is a closed linear subspace of U .
Proof:
If x, y ∈ Ker A and α, β ∈ K, then
A
(αx + β y) = αA(x) + β A(y) = 0 .
Consequently αx + β y ∈ Ker A and it is a linear subspace. Furthermore if x
n
→ x and
A
(x
n
) = 0 for all n, then A(x) = 0 due to continuity of A.
Note that the range is a linear subspace but not necessarily closed. Exercise: con-
struct an example (see Examples sheet 3).
60

10
Linear functionals
10.1
Definition and examples
Definition 10.1 If U is a vector space then a linear map U → K is called a linear
functional on U ,
Definition 10.2 The space of all continuous functionals on a normed space U is called
the dual space, i.e., U
∗
= B(U, K) .
The dual space equipped with the operator norm is Banach. Indeed, K = R or C
which are both complete. Then Theorem 9.5 implies that U
∗
is Banach.
1. Example: δ
x
( f ) = f (x), x ∈ [a, b], is a bounded linear functional on C[a, b].
2. Example: Let φ ∈ C[a, b] (or φ ∈ L
2
(a, b)). Then `
φ
(x) =
R
b
a
φ (t)x(t) dt is a
bounded linear functional on L
2
(a, b).
3. Example: Let H be a Hilbert space and y ∈ H. Then `
y
: H → K defined by
`
y
(x) = (x, y)
is a bounded functional (by the Cauchy-Schwartz inequality), k`
y
k
op
= kyk
H
.
10.2
Riesz representation theorem
The following theorem is one of the fundamental results of Functional Analysis: it
states that the map y 7→ `
y
is an isometry between H and its dual space H
∗
.
Theorem 10.3 (Riesz Representation Theorem) Let H be a Hilbert space. For any
bounded linear functional f
: H → K there is a unique y ∈ H such that
f
(x) = (x, y)
for all x
∈ H.
Moreover,
k f k
H
∗
= kyk
H
.
Proof:
Let K = Ker f . It is a closed linear subspace of H.
If K = H then f (x) = 0 for all x and the statement of the theorem is true with y = 0.
If K 6= H, we first prove that dim K
⊥
= 1. Indeed, since K
⊥
6= {0}, there is a
vector z ∈ K
⊥
with kzk
H
= 1. Now take any u ∈ K
⊥
. Since K
⊥
is a linear subspace
v
= f (z)u − f (u)z ∈ K
⊥
. On the other hand
f
(v) = f ( f (z)u − f (u)z) = f (z) f (u) − f (u) f (z) = 0
and so v ∈ K. For any linear subspace K ∩ K
⊥
= { 0 }, and so v = 0. Then f (z)u −
f
(u)z = v = 0, i.e. u =
f
(u)
f
(z)
z
. Consequently { z } is the basis in K
⊥
and consequently
dim K
⊥
= 1.
61

Since K is closed, Theorem 8.7 implies that every vector x ∈ H can be written
uniquely in the form
x
= u + v
where u ∈ K and v ∈ K
⊥
.
Since {z} is an orthonormal basis in K
⊥
, we have u = (x, z)z. Moreover,
f
(x) = f (u) + f (v) = f (u) = (x, z) f (z) = (x, f (z) z) .
Set y = f (z) z to get the desired equality:
f
(x) = (x, y)
∀x ∈ H .
If there is another y
0
∈ H such that f (x) = (x, y
0
) for all x ∈ H, then (x, y) = (x, y
0
) for
all x, i.e., (x, y − y
0
) = 0. Setting x = y − y
0
we conclude ky − y
0
k
2
= 0, i.e. y = y
0
is
unique.
Finally, the Cauchy-Schwartz inequality implies
| f (x)| = |(x, y)| ≤ kxk kyk,
i.e., k f k
H
∗
= k f k
op
≤ kyk. On the other hand,
k f k
op
≥
| f (y)|
kyk
=
|(y, y)|
kyk
= kyk .
Consequently, k f k
H
∗
= kyk
H
.
62

11
Linear operators on Hilbert spaces
11.1
Complexification
In the next lectures we will discuss the spectral theory of linear operators. The spectral
theory looks more natural in complex spaces. In particular, the theory studies eigenval-
ues and eigenvectors of linear maps (i.e. non-zero solutions of the equation Ax = λ x).
In the finite-dimensional space a linear operator can be describe by a matrix. You al-
ready know that a matrix (even a real one) can have complex eigenvalues. Fortunately
a real Hilbert space can always be considered as a part of a complex one due to the
“complexification” procedure.
Definition 11.1 Let H be a real Hilbert space. The complexification of H is the com-
plex vector space
H
C
= { x + iy : x, y ∈ H }
where the addition and multiplication are respectively defined by
(x + iy) + (u + iw) = (x + u) + i(y + w)
(α + iβ )(x + iy) = (αx − β y) + i(αy + β x) .
The inner product is defined by
(x + iy, u + iw) = (x, u) − i(x, w) + i(y, u) + (y, w) .
Exercise: Show that H
C
is a Hilbert space.
Example: The complexification of `
2
(R) is `
2
(C).
Exercise: Show that kx + iyk
2
H
C
= kxk
2
+ kyk
2
for all x, y ∈ H.
The following lemma states that any bounded operator on H can be extended to a
bounded operator on H
C
.
Lemma 11.2 Let H be a real Hilbert space and A : H → H be a bounded operator.
Then
A
C
(x + iy) = A(x) + iA(y)
is a bounded operator H
C
→ H
C
.
Exercise: Prove the lemma.
63

11.2
Adjoint operators
Theorem 11.3 If A : H → H is a bounded linear operator on a Hilbert space H, then
there is a unique bounded operator A
∗
: H → H such that
(Ax, y) = (x, A
∗
y
)
for all x
, y ∈ H .
Moreover,
kA
∗
k
op
≤ kAk
op
.
Definition 11.4 The operator A
∗
is called the
adjoint operator of a bounded operator A
if
(Ax, y) = (x, A
∗
y
) for all x, y ∈ H.
Proof:
Let y ∈ H and f (x) = (Ax, y) for all x ∈ H. The map f : H → K is linear and
| f (x)| = |(Ax, y)| ≤ kAxk kyk ≤ kAk
op
kxk kyk
where we have used the Cauchy-Schwartz inequality. Consequently, f is a bounded
functional on H. The Riesz representation theorem implies that there is a unique z ∈ H
such that
(Ax, y) = (x, z)
for all x ∈ H.
Define the function A
∗
: H → H by A
∗
y
= z. Then
(Ax, y) = (x, A
∗
y
)
for all x, y ∈ H.
First, A
∗
is linear since for any x, y
1
, y
2
∈ H and α
1
, α
2
∈ K
(x, A
∗
(α
1
y
1
+ α
2
y
2
)) = (Ax, α
1
y
1
+ α
2
y
2
) = ¯
α
1
(Ax, y
1
) + ¯
α
2
(Ax, y
2
)
=
¯
α
1
(x, A
∗
y
1
) + ¯
α
2
(x, A
∗
y
2
) = (x, α
1
A
∗
y
1
+ α
2
A
∗
y
2
) .
Since the equality is valid for all x ∈ H, it implies
A
∗
(α
1
y
1
+ α
2
y
2
) = α
1
A
∗
y
1
+ α
2
A
∗
y
2
.
Second, A
∗
is bounded since
kA
∗
y
k
2
= (A
∗
y
, A
∗
y
) = (AA
∗
y
, y) ≤ kAA
∗
y
k kyk ≤ kAk
op
kA
∗
y
k kyk .
If kA
∗
y
k 6= 0 we divide by kA
∗
y
k and obtain
kA
∗
y
k ≤ kAk
op
kyk .
If A
∗
y
= 0, this inequality is obvious. Therefore this inequality holds for all y. Thus
A
∗
is bounded and kA
∗
k
op
≤ kAk
op
.
1. Example: If A : C
n
→ C
n
, then A
∗
is the Hermitian conjugate of A, i.e. if
A
∗
= ¯
A
T
(the complex conjugate of the transposed matrix).
64

2. Example: Integral operator on L
2
(0, 1)
(Ax)(t) =
Z
1
0
K
(t, s)x(s)ds
The adjoint operator
(A
∗
y
)(s) =
Z
1
0
K
(t, s)y(t)dt
Indeed, for any x, y ∈ L
2
(0, 1):
(Ax, y) =
Z
1
0
Z
1
0
K
(t, s)x(s) ds
¯
y
(t) dt
=
Z
1
0
Z
1
0
K
(t, s)x(s) ¯
y
(t) ds dt
=
Z
1
0
x
(s)
Z
1
0
K
(t, s)y(t) dt
!
ds
= (x, A
∗
y
) .
Note that we used Fubini’s Theorem to change the order of integration.
3. Example: Shift operators: T
∗
l
= T
r
and T
∗
r
= T
l
. Indeed,
(T
r
x
, y) =
∞
∑
k
=1
x
k
¯
y
k
+1
= (x, T
l
y
) .
The following lemma states some elementary properties of adjoint operators.
Lemma 11.5 If A, B : H → H are bounded operators on a Hilbert space H and α, β ∈
C, then
1.
(αA + β B)
∗
= ¯
α A
∗
+ ¯
β B
∗
2.
(AB)
∗
= B
∗
A
∗
3.
(A
∗
)
∗
= A
4.
kA
∗
k = kAk
5.
kA
∗
A
k = kAA
∗
k = kAk
2
Proof:
Statements 1–3 follow directly from the definition of an adjoint operator (Ex-
ercise). Statement 4 follows from 3 and the estimate of Theorem 11.3: indeed,
kA
∗
k ≤ kAk = k(A
∗
)
∗
k ≤ kA
∗
k.
Finally, in order to prove the statement 5 we note
kAxk
2
= (AX , Ax) = (x, A
∗
Ax
) ≤ kxk kA
∗
Ax
k ≤ kA
∗
A
k kxk
2
implies kAk
2
≤ kAA
∗
k. On the other hand kA
∗
A
k ≤ kA
∗
k kAk = kAk
2
and conse-
quently kA
∗
A
k = kAk
2
.
65

11.3
Self-adjoint operators
Definition 11.6 A linear operator A is self-adjoint, if A
∗
= A.
Lemma 11.7 An operator A ∈ B(H, H) is self-adjoint iff it is symmetric:
(x, Ay) = (Ax, y)
for all x
, y ∈ H.
1. Example: H = R
n
, a linear map defined by a symmetric matrix is self-adjoint.
2. Example: H = C
n
, a linear map defined by a Hermitian matrix is self-adjoint.
3. Example: A : L
2
(0, 1) → L
2
(0, 1)
A f
(t) =
Z
1
0
K
(t, s) f (s) ds
with real symmetric K, K(t, s) = K(s,t), is self-adjoint.
Let A : H → H be a linear operator. A scalar λ ∈ K is an eigenvalue of A if there is
x
∈ H, x 6= 0, such that Ax = λ x. The vector x is a called an eigenvector of A.
Theorem 11.8 Let A be a self-adjoint operator on a Hilbert space H. Then all eigen-
values of A are real and the eigenvectors corresponding to distinct eigenvalues are
orthogonal.
Proof:
Suppose Ax = λ x with x 6= 0. Then
λ kxk
2
= (λ x, x) = (Ax, x) = (x, A
∗
x
) = (x, Ax) = (x, λ x) = ¯λ kxk
2
.
Consequently, λ is real.
Now if λ
1
and λ
2
are distinct eigenvalues and Ax
1
= λ
1
x
1
, Ax
2
= λ
2
x
2
, then
0 = (Ax
1
, x
2
) − (x
1
, Ax
2
) = (λ
1
x
1
, x
2
) − (x
1
, λ
2
x
2
) = (λ
1
− λ
2
) (x
1
, x
2
).
Since λ
1
− λ
2
6= 0, we conclude (x
1
, x
2
) = 0.
Exercise: Let A be a self-adjoint operator on a real Hilbert space H. Show that its
complexification A
C
: H
C
→ H
C
is also self-adjoint. Show that if λ is an eigenvalue of
A
C
, then there is x ∈ H, x 6= 0, such that Ax = λ x.
Theorem 11.9 If A is a bounded self-adjoint operator then
1.
(Ax, x) is real for all x ∈ H
2.
kAk
op
= sup
kxk=1
|(Ax, x)|
66

Proof:
For any x ∈ H
(Ax, x) = (x, Ax) = (Ax, x)
which implies (Ax, x) is real. Now let
M
= sup
kxk=1
|(Ax, x)| .
The Cauchy-Schwartz inequality implies
|(Ax, x)| ≤ kAxk kxk ≤ kAk
op
kxk
2
= kAk
op
for all x ∈ H such that kxk = 1. Consequently M ≤ kAk
op
. On the other hand, for any
u
, v ∈ H we have
4 Re(Au, v) = (A(u + v), u + v) − (A(u − v), u − v)
≤ M ku + vk
2
+ ku − vk
2
= 2M kuk
2
+ kvk
2
using the parallelogram law. If Au 6= 0 let
v
=
kuk
kAuk
Au
to obtain, since kuk = kvk, that
kuk kAuk ≤ Mkuk
2
.
Consequently kAuk ≤ Mkuk (for all u, including those with Au = 0) and kAk
op
≤ M.
Therefore kAk
op
= M.
67

Unbounded operators and their adjoint operators
The notions of adjoint and self-adjoint operators play an important role in the general
theory of linear operators. If an operator is not bounded a special care should be taken
in the consideration of its domain of definition.
Let D(A) be a linear subspace of a Hilbert space H, and A : D(A) → H be a linear
operator. If D(A) is dense in H we say that A is densely defined.
Example: Consider the operator A( f ) =
d f
dt
on the set of all continuously differentiable
functions, i.e., D(A) = C
1
[0, 1] ⊂ L
2
(0, 1). Since continuously differentiable functions
are dense in L
2
, this operator is densely defined.
Given a densely defined linear operator A on H, its adjoint A
∗
is defined as follows:
• D(A
∗
), the domain of A
∗
, consists of all vectors x ∈ H such that
y
7→ (x, Ay)
is a continuous linear functional D(A) → K. By continuity and density of D(A),
it extends to a unique continuous linear functional on all of H.
• By the Riesz representation theorem, if x ∈ D(A
∗
), there is a unique vector z ∈ H
such that
(x, Ay) = (z, y)
for all y ∈ D(A).
This vector z is defined to be A
∗
x
.
It can be shown that A
∗
: D(A
∗
) → H is linear.
The definition implies (Ax, y) = (x, A
∗
y
) for all x ∈ D(A) and y ∈ D(A
∗
).
Note that two properties play a key role in this definition: the density of the domain
of A in H, and the uniqueness part of the Riesz representation theorem.
A linear operator is symmetric if
(Ax, y) = (x, Ay)
x
, y ∈ D(A).
If A is symmetric then D(A) ⊆ D(A
∗
) and A coincides with the restriction of A
∗
onto
D
(A). An operator is self adjoint if A = A
∗
, i.e., it is symmetric and D(A) = D(A
∗
).
In general, the condition for a linear operator on a Hilbert space to be self-adjoint is
stronger than to be symmetric. If an operator is bounded then it is normally assumed
that D(A) = D(A
∗
) = H and therefore a symmetric operator is self-adjoint.
The Hellinger-Toeplitz theorem states that an everywhere defined symmetric oper-
ator on a Hilbert space is bounded.
13
Optional topic
68

12
Introduction to Spectral Theory
12.1
Point spectrum
Let H be a complex Hilbert space and A : H → H a linear operator. If Ax = λ x for
some x ∈ H, x 6= 0, and λ ∈ C, then λ is an eigenvalue of A and x is an eigenvector.
The space
E
λ
= { x ∈ H : Ax = λ x }
is called the eigenspace.
Exercise: Prove the following: If A ∈ B(H, H) and λ is an eigenvalue of A, then E
λ
is
a closed linear subspace in H. Moreover, E
λ
is invariant, i.e., A(E
λ
) = E
λ
(if λ 6= 0).
Definition 12.1 The point spectrum of A consists of all eigenvalues of A:
σ
p
(A) = { λ ∈ C : Ax = λ x for some x ∈ H, x 6= 0 } .
Proposition 12.2 If A : H → H is bounded and λ is its eigenvalue then
kλ k ≤ kAk
op
.
Proof:
If Ax = λ x with x 6= 0, then
kAk
op
= sup
y
6=0
kAyk
kyk
≥
kAxk
kxk
= |λ | .
Examples:
1. A linear map on an n-dimensional vector space has at least one and at most n
different eigenvalues.
2. The right shift T
r
: `
2
→ `
2
has no eigenvalues, i.e., the point spectrum is empty.
Indeed, suppose T
r
x
= λ x, then
(0, x
1
, x
2
, x
3
, x
4
, . . .) = λ (x
1
, x
2
, x
3
, x
4
, . . . )
implies 0 = λ x
1
, x
1
= λ x
2
, x
2
= λ x
3
, . . . If λ 6= 0, we divide by λ and conclude
x
1
= x
2
= · · · = 0. If λ = 0 we also get x = 0. Consequently
σ
p
(T
r
) = /0.
3. The point spectrum of the left shift T
l
: `
2
→ `
2
is the open unit disk. Indeed,
suppose T
l
x
= λ x with λ ∈ C. Then
(x
2
, x
3
, x
4
, . . .) = λ (x
1
, x
2
, x
3
, x
4
, . . . )
is equivalent to x
2
= λ x
1
, x
3
= λ x
2
, x
4
= λ x
3
, . . . Consequently, x = (x
k
)
∞
k
=1
with
x
k
= λ
k
−1
x
1
for all k ≥ 2. This sequence belongs to `
2
if and only if ∑
∞
k
=1
|x
k
|
2
=
∑
∞
k
=1
|x
1
| |λ |
2k
converges or equivalently |λ | < 1. Therefore
σ
p
(T
l
) = { λ ∈ C : |λ | < 1 } .
69

12.2
Invertible operators
Let us discuss the concept of an inverse operator.
Definition 12.3 (injective operator) We say that A : U → V is injective if the equation
Ax
= y has a unique solution for every y ∈ Range(A).
Definition 12.4 (bijective operator) We say that A : U → V is bijective if the equation
Ax
= y has exactly one solution for every y ∈ V .
Definition 12.5 (inverse operator) We say that A is invertible if it is bijective. Then
the equation Ax
= y has a unique solution for all y ∈ V and we define A
−1
y
= x.
1. Exercise: Show that A
−1
is a linear operator.
2. Exercise: Show that a linear operator A : U → V is invertible iff
Ker(A) = { 0 }
and
Range(A) = V.
3. Exercise: Show that if A
−1
is invertible, then A
−1
is also invertible and
(A
−1
)
−1
= A.
4. Exercise: Show that if A and B are two invertible linear operators, then AB is
also invertible and (AB)
−1
= B
−1
A
−1
.
We will use I
V
: V → V to denote the identity operator on V , i.e., I
V
(x) = x for all
x
∈ V . Moreover, we will skip the subscript V if there is no danger of a mistake. It is
easy to see that if A : U → V is invertible then
AA
−1
= I
V
and
A
−1
A
= I
U
.
Example: The right shift T
r
: `
2
→ `
2
has a trivial kernel and
T
l
T
r
= I.
but it is not invertible since Range(T
r
) 6= `
2
. (Indeed, any sequence in the range of T
r
has a zero on the first place). Consequently, the equality AB = I alone does not imply
that B = A
−1
.
Lemma 12.6 If A : U → V and B : V → U are linear operators such that
AB
= I
V
and
BA
= I
U
then A and B are both invertible and B
= A
−1
.
Proof:
The equality ABy = y for all y ∈ V implies that Ker B = { 0 } and Range A = V .
On the other hand BAx = x for all x ∈ U implies Ker A = { 0 } and Range B = U .
Therefore both A and B satisfy the definition of invertible operator.
70

12.3
Resolvent and spectrum
Let A : V → V be a linear operator on a vector space V . A complex number λ is an
eigenvalue of A if Ax = λ x for some x 6= 0. This equation is equivalent to (A−λ I)x = 0.
Then we immediately see that A − λ I is not invertible since 0 has infinitely many
preimages: α x with α ∈ C.
If V is finite dimensional the reversed statement is also true: if A − λ I is not invert-
ible then λ is an eigenvalue of A (recall the Fredholm alternative from the first year
Linear Algebra). In the infinite dimensional case this is not necessarily true.
Definition 12.7 (resolvent set and spectrum) The resolvent set of a linear operator
A
: H → H is defined by
R
(A) = { λ ∈ C : (A − λ I)
−1
∈ B(H, H) } .
The resolvent set consists of
regular values. The spectrum is the complement to the
resolvent set in C:
σ (A) = C \ R(A) .
Note that the definition of the resolvent set assumes existence of the inverse opera-
tor (A−λ I)
−1
for λ ∈ R(A). If λ ∈ σ
p
(A) then (A−λ I) is not invertible. Consequently
any eigenvalue λ ∈ σ (A) and
σ
p
(A) ⊆ σ (A) .
The spectrum of A can be larger than the point spectrum.
Example: The point spectrum of the right shift operator T
r
is empty but since
Range T
r
6= `
2
it is not invertible and therefore 0 ∈ σ (T
r
). So σ
p
(T
r
) 6= σ (T
r
).
Technical lemmas
The following two lemmas will help us in the study of the resolvent set: they establish
useful conditions which guarantee that an operator has a bounded inverse.
Lemma 12.8 If T ∈ B(H, H) and kT k < 1, then (I − T )
−1
∈ B(H, H). Moreover
(I − T )
−1
= I + T + T
2
+ T
3
+ . . .
and
k(I − T )
−1
k ≤ (1 − kT k)
−1
.
Proof:
Consider the sequence V
n
= I + T + T
2
+ · · · + T
n
. Since
kT
n
x
k ≤ kT k kT
n
−1
x
k
71

we conclude that kT
n
k ≤ kT k
n
. Consequently for any m > n we have
kV
m
−V
n
k =
T
n
+1
+ T
n
+2
+ · · · + T
m
≤ kT k
n
+1
+ kT k
n
+2
+ · · · + kT k
m
=
kT k
n
+1
− kT k
m
+1
1 − kT k
≤
kT k
n
+1
1 − kT k
.
Since kT k < 1, V
n
is a Cauchy sequence in the operator norm. The space B(H, H) is
complete and there is V ∈ B(H, H) such that V
n
→ V . Moreover,
kV k ≤ 1 + kT k + kT k
2
+ · · · = (1 − kT k)
−1
.
Finally, taking the limit as n → ∞ in the equalities
V
n
(I − T ) = V
n
−V
n
T
= I − T
n
+1
,
(I − T )V
n
= V
n
− TV
n
= I − T
n
+1
and using that T
n
+1
→ 0 in the operator norm we get V (I − T ) = (I − T )V = I.
Lemma 12.6 implies (I − T )
−1
= V .
Lemma 12.9 Let H be a Hilbert space and T, T
−1
∈ B(H, H). If U ∈ B(H, H) and
kUk kT
−1
k < 1, then the operator T +U is invertible and
(T +U )
−1
≤
kT
−1
k
1 − kU k kT
−1
k
.
Proof:
Consider the operator V = T
−1
(T +U ) = I + T
−1
U
. Since
kT
−1
U
k ≤ kT
−1
k kUk < 1,
Lemma 12.8 implies that V is invertible and
kV
−1
k ≤ 1 − kT
−1
k kUk
−1
.
Moreover, the definition of V implies that T +U = TV . The composition of the invert-
ible operators T and V is invertible and consequently
(T +U )
−1
= V
−1
T
−1
.
Finally, k(T +U )
−1
k ≤ kV
−1
k kT
−1
k implies the desired upper bound for the norm of
the inverse operator.
72

Properties of the spectrum
Lemma 12.10 If A : H → H is bounded and λ ∈ σ (A) then ¯λ ∈ σ (A
∗
).
Proof:
If λ ∈ R(A) then A − λ I has a bounded inverse:
(A − λ I)(A − λ I)
−1
= I = (A − λ I)
−1
(A − λ I) .
Taking adjoints we obtain
(A − λ I)
−1
∗
(A
∗
− ¯λ I) = I = (A
∗
− ¯λ I) (A − λ I)
−1
∗
.
Consequently, (A
∗
− ¯λ I) has a bounded inverse (A − λ I)
−1
∗
(an adjoint of a bounded
operator). Therefore λ ∈ R(A) iff ¯λ ∈ R(A
∗
). Since the spectrum is the complement
of the resolvent set we also get λ ∈ σ (A) iff ¯λ ∈ σ (A
∗
).
Proposition 12.11 If A is bounded and λ ∈ σ (A) then |λ | ≤ kAk
op
.
Proof:
Take λ ∈ C such that |λ | > kAk
op
. Since kλ
−1
A
k
op
< 1 Lemma 12.8 implies
that I − λ
−1
A
is invertible and the inverse operator is bounded. Consequently, A − λ I =
−λ (I − λ
−1
A
) also has a bounded inverse and so λ ∈ R(A). The proposition follows
immediately since σ (A) is the complement of R(A).
Proposition 12.12 If A is bounded then R(A) is open and σ (A) is closed.
Proof:
Let λ ∈ R(A). Then T = (A − λ I) has a bounded inverse. Set U = −δ I.
Obviously, kU k = |δ |. Let
|δ | < kT
−1
k
−1
,
then Lemma 12.9 implies that T + U = A − (λ + δ )I also has a bounded inverse. So
λ + δ ∈ R(A). Consequently R(A) is open and σ (A) = C \ R(A) is closed.
Example: The spectrum of T
l
and of T
r
are both equal to the closed unit disk on the
complex plane.
Indeed, σ
p
(T
l
) = { λ ∈ C : |λ | < 1 }. Since σ
p
(T
l
) ⊂ σ (T
l
) and σ (T
l
) is closed, we
conclude that σ (T
l
) includes the closed unit disk. On the other hand, Proposition 12.11
implies that σ (T
l
) is a subset of the closed disk |λ | ≤ kT
l
k
op
= 1. Therefore
σ (T
l
) = { λ ∈ C : |λ | ≤ 1 } .
Since T
r
= T
∗
l
and σ (T
l
) is symmetric with respect to the real axis, Lemma 12.10
implies σ (T
r
) = σ (T
l
).
73

13
Compact operators
13.1
Definition, properties and examples
Definition 13.1 Let X be a normed space and Y be a Banach space. Then a linear
operator A
: X → Y is compact if the image of any bounded sequence has a convergent
subsequence.
Obviously a compact operator is bounded. Indeed, otherwise there is a sequence
x
n
with kx
n
k = 1 such that kAx
n
k > n for each n. The sequence Ax
n
does not contain a
convergent subsequence (it does not even contain a bounded subsequence).
Example: Any bounded operator with finite-dimensional range is compact. Indeed, in
a finite dimensional space any bounded sequence has a convergent subsequence.
Proposition 13.2 Let X be a normed space and Y be Banach. A linear operator A :
X
→ Y is compact iff the image of the unit sphere is sequentially compact.
Theorem 13.3 If X is a normed space and Y is a Banach space, then compact linear
operators form a closed linear subspace in B
(X ,Y ).
Proof:
If K
1
, K
2
are compact operators and α
1
, α
2
∈ K, then α
1
K
1
+ α
2
K
2
is also com-
pact. Indeed, take any bounded sequence (x
n
) in H. There is a subsequence x
n
1 j
such that K
1
x
n
1 j
converges. This subsequence is also bounded, so it contains a subse-
quence x
n
2 j
such that K
2
x
n
2 j
converges. Obviously K
1
x
n
2 j
also converges and therefore
α
1
K
1
x
n
2 j
+ α
2
K
2
x
n
2 j
is convergent and consequently α
1
K
1
+ α
2
K
2
is compact. There-
fore the compact operators form a linear subspace.
Let us prove that this subspace is closed. Let K
n
be a convergent sequence of
compact operators: K
n
→ K in B(H, H). Take any bounded sequence (x
n
) in X . Since
K
1
is compact, there is a subsequence x
n
1 j
such that K
1
x
n
1 j
converges. Since x
n
1 j
is
bounded and K
2
is compact, there is a subsequence x
n
2 j
such that K
2
x
n
2 j
converges.
Repeat this inductively: for each k there is a subsequence x
n
k j
of the original sequence
such that K
l
x
n
k j
converges as j → ∞ for all l ≤ k.
Consider the diagonal sequence y
j
= x
n
j j
. Obviously (y
j
)
∞
j
=k
is a subsequence of
(x
n
k j
)
∞
j
=1
. Consequently K
l
y
j
converges as j → ∞ for every l.
In order to show that K is compact it is sufficient to prove that Ky
j
is Cauchy:
kKy
j
− Ky
l
k ≤ kKy
j
− K
n
y
j
k + kK
n
y
j
− K
n
y
l
k + kKy
l
− K
n
y
l
k
≤ kK − K
n
k ky
j
k + ky
l
k
+ kK
n
y
j
− K
n
y
l
k .
Given ε > 0 choose n sufficiently large to ensure that the first term is less than
ε
2
,
then choose N sufficiently large to guarantee that the second term is less than
ε
2
for
all j, l > N. So Ky
j
is Cauchy and consequently converges. Therefore K is a compact
operator, and the subspace formed by compact operators is closed.
74

Proposition 13.4 The integral operator A : L
2
(a, b) → L
2
(a, b) defined by
(A f )(t) =
Z
b
a
K
(t, s) f (s) ds
with
Z
b
a
Z
b
a
|K(t, s)|
2
dsdt
< ∞
is compact.
Proposition 13.5 If X is a normed space and Y is a Banach space, then the operators
of finite range a dense among compact operators in B
(X ,Y ).
13.2
Spectral theory for compact self-adjoint operators
Lemma 13.6 If T : H → H a compact self-adjoint operator on a Hilbert space H, then
at least one of λ
±
= ±kT k
op
is an eigenvalue of T .
Proof:
Assume T 6= 0 (otherwise the lemma is trivial). Since
kT k
op
= sup
kxk=1
|(T x, x)|
there is a sequence x
n
∈ H such that kx
n
k = 1 and |(T x
n
, x
n
)| → kT k
op
. Since T is
compact, y
n
= T x
n
has a convergent subsequence. Relabel this subsequence as x
n
and
let y = lim
n
→∞
T x
n
. Then (T x
n
, x
n
) → α with α = ±kT k
op
, and
kT x
n
− αx
n
k
2
= kT x
n
k
2
− 2α(T x
n
, x
n
) + α
2
≤ 2α
2
− 2α(T x
n
, x
n
) .
The right hand side converges to 0 as n → ∞. Consequently T x
n
− αx
n
→ 0. On the
other hand T x
n
→ y and consequently x
n
also converges:
x
n
→ x = α
−1
y
.
The operator T is continuous and consequently T x = αx. Finally, since kx
n
k = 1 for
all n, we have kxk = 1, and consequently α is an eigenvalue.
Proposition 13.7 Let H be an infinitely dimensional Hilbert space and T : H → H a
compact self-adjoint operator. Then σ
p
(T ) is either a finite set or countable sequence
tending to zero. Moreover, every non-zero eigenvalue corresponds to a finite dimen-
sional eigenspace.
Proof:
Suppose there is ε > 0 such that T has infinitely many different eigenvalues
with |λ
n
| > ε. Let x
n
be corresponding eigenvectors with kx
n
k = 1. Since the operator
is self-adjoint, this sequence is orthonormal and for any n 6= m
kT x
n
− T x
m
k
2
= kλ
n
x
n
− λ
m
x
m
k
2
= (λ
n
x
n
− λ
m
x
m
, λ
n
x
n
− λ
m
x
m
) = |λ
n
|
2
+ |λ
m
|
2
> 2ε
2
.
75

Consequently, (T x
n
) does not have a convergent subsequence (none of the subse-
quences is Cauchy). This contradicts to the compactness of T . Consequently, σ
p
(T )
is either finite or a converging to zero sequence.
Now let λ 6= 0 be an eigenvalue and E
λ
the corresponding eigenspace. Let ˜
A
:
E
λ
→ E
λ
be the restriction of A onto E
λ
. Since ˜
Ax
= λ x for any x ∈ E
λ
, the operator
˜
A
maps the unit sphere into the sphere of radius λ . Since A is compact, the image of
the unit sphere is sequentially compact. Therefore the sphere of radius λ is compact.
Since E
λ
is a Hilbert (and consequently Banach) space itself, Theorem 3.18 implies
that E
λ
is finite dimensional.
Theorem 13.8 (Hilbert-Schmidt theorem) Let H be a Hilbert space and T : H → H
be a compact self-adjoint operator. Then there is a finite or countable orthonormal
sequence
(e
n
) of eigenvectors of T with corresponding real eigenvalues (λ
n
) such that
T x
=
∑
j
λ
j
(x, e
j
)e
j
for all x
∈ H.
Proof:
We construct the sequence e
j
inductively. Let H
1
= H and T
1
= T : H
1
→
H
1
. Lemma 13.6 implies that there is an eigenvector e
1
∈ H
1
with ke
1
k = 1 and an
eigenvalue λ
1
∈ R such that |λ
1
| = kT
1
k
B
(H
1
,H
1
)
.
Then let H
2
= {x ∈ H
1
: x ⊥ e
1
}. If x ∈ H
2
then T x ∈ H
2
. Indeed, since T is
self-adjoint
(T x, e
1
) = (x, Te
1
) = λ
1
(x, e
1
) = 0
and T x ∈ H
2
. Therefore the restriction of T onto H
2
is an operator T
2
: H
2
→ H
2
. Since
H
2
is an orthogonal complement, it is closed and so a Hilbert space itself. Lemma 13.6
implies that there is an eigenvector e
2
∈ H
2
with ke
2
k = 1 and an eigenvalue λ
2
∈ R
such that |λ
2
| = kT
2
k
B
(H
2
,H
2
)
. Then let H
3
= { x ∈ H
2
: x ⊥ e
2
} and repeat the procedure
as long as T
n
is not zero.
Suppose T
n
= 0 for some n ∈ N. Then for any x ∈ H let
y
= x −
n
−1
∑
j
=1
(x, e
j
)e
j
.
Applying T to the equality we get:
Ty
= T x −
n
−1
∑
j
=1
(x, e
j
)Te
j
= T x −
n
−1
∑
j
=1
(x, e
j
)λ
j
e
j
.
Since y ⊥ e
j
for j < n we have y
n
∈ H
n
and consequently Ty = T
n
y
= 0. Therefore
T x
=
n
−1
∑
j
=1
(x, e
j
)λ
j
e
j
which is the required formula for T .
76

Suppose T
n
6= 0 for all n ∈ N. Then for any x ∈ H and any n consider
y
n
= x −
n
−1
∑
j
=1
(x, e
j
)e
j
.
Since y
n
⊥ e
j
for j < n we have y ∈ H
n
and
kxk
2
= ky
n
k
2
+
n
−1
∑
j
=1
|(x, e
j
)|
2
.
Consequently ky
n
k
2
≤ kxk
2
. On the other hand kT
n
k = λ
n
and
T x
−
n
−1
∑
j
=1
(x, e
j
)λ
j
e
j
= kTy
n
k ≤ kT
n
k ky
n
k ≤ |λ
n
| kxk
and since λ
n
→ 0 as n → ∞ we have
T x
=
∞
∑
j
=1
(x, e
j
)λ
j
e
j
.
Corollary 13.9 Let H be an infinite dimensional separable Hilbert space and T : H →
H a compact self-adjoint operator. Then there is an orthonormal basis E
= { e
j
: j ∈
N } in H such that Te
j
= λ
j
e
j
for all j
∈ N and
T x
=
∞
∑
j
=1
λ
j
(x, e
j
)e
j
for all x
∈ H.
Exercise: Deduce that operators with finite range are dense among compact self-
adjoint operators.
Theorem 13.10 If H is an infinite dimensional Hilbert space and T : H → H is a
compact self-adjoint operator, then σ (T ) = σ
p
(T ).
Proposition 13.7 states that σ
p
(T ) is either a finite or countably infinite set. More-
over, if σ
p
(T ) is not finite, zero is the unique limit point of σ
p
(T ). If σ
p
(T ) is fi-
nite, then the Hilbert-Schmidt theorem implies that the kernel of T is not trivial (since
Ker T = { e
n
}
⊥
) and consequently 0 ∈ σ
p
(T ). Therefore Theorem 13.10 means that
σ (T ) = σ
p
(T ) ∪ { 0 }. In particular, σ (T ) = σ
p
(T ) if zero is an eigenvalue.
Proof:
We will prove the theorem assuming that H is separable.
14
The proof uses a countable basis in H. The theorem remains valid for a non-separable H but the
proof requires a modification, which is based on the following observation: Proposition 13.7 implies
that (Ker T )
⊥
has at most countable orthonormal basis of eigenvectors { e
j
}. Then for any vector x ∈ H
write x = P
Ker T
(x) + ∑
∞
j
=1
(x, e
j
)e
j
where P
Ker T
is the orthogonal projection on the kernel of T . Then
follow the arguments of the proof adding this term when necessary).
77

Then Corollary 13.9 implies that
T x
=
∞
∑
j
=1
λ
j
(x, e
j
)e
j
where { e
j
} is an orthonormal basis in H.Then x = ∑
∞
j
=1
(x, e
j
)e
j
and for any µ ∈ C
(T − µI)x =
∞
∑
j
=1
(λ
j
− µ)(x, e
j
)e
j
.
Let µ ∈ C \ σ
p
(T ) which is an open subset of C. Consequently there is ε > 0 such that
|µ − λ | > ε for all λ ∈ σ
p
(T ) ⊂ σ
p
(T ). Consider an operator S defined by
Sy
=
∞
∑
k
=1
(y, e
k
)
λ
k
− µ
e
k
.
Lemma 7.12 implies that the series converges since |λ
k
− µ| > ε and
kSyk
2
=
∞
∑
k
=1
(y, e
k
)
λ
k
− µ
2
≤ ε
−2
∞
∑
j
=k
|(y, e
k
)|
2
= ε
−2
kyk
2
.
In particular we see that S is bounded with kSk
op
≤ ε
−1
. Moreover S = (T − µI)
−1
.
Indeed,
(T − µI)Sy =
∞
∑
j
=1
(λ
j
− µ)(Sy, e
j
)e
j
=
∞
∑
j
=1
λ
j
− µ
λ
j
− µ
(y, e
j
)e
j
= y .
and
S
(T − µI)x =
∞
∑
j
=1
((T − µI)x, e
j
)
λ
j
− µ
e
j
=
∞
∑
j
=1
λ
j
− µ
λ
j
− µ
(x, e
j
)e
j
= x .
Then S = (T − µI)
−1
and µ ∈ R(T ), and so σ (T ) ⊆ σ
p
(T ). On the other hand,
σ
p
(T ) ⊆ σ (T ). We conclude σ (T ) = σ
p
(T ).
78

14
Sturm-Liouville problems
In this chapter we will study the Sturm-Liouville problem: a differential equation of
the form
−
d
dx
p
(x)
du
dx
+ q(x)u = λ u
with u(a) = u(b) = 0
where p and q are given functions on the interval [a, b]. The values of λ for which
the problem has a non-trivial solution are called eigenvalues of the Sturm-Liouville
problem and the corresponding solutions u are called eigenfunctions.
An eigenvalue is called simple, if the corresponding eigenspace is one-dimensional.
The main conclusion of this chapter is the following theorem:
Theorem 14.1 If p ∈ C
1
[a, b], q ∈ C
0
[a, b], p(x) > 0 and q(x) ≥ 0 for all x ∈ [a, b],
then
(i) eigenvalues of the Sturm-Liouville problem are all simple,
(ii) they form an unbounded monotone sequence,
(iii) eigenfunctions of the Sturm-Liouville problem form an orthonormal basis in L
2
(a, b).
For a function u ∈ C
2
[a, b] we define
L
(u) = −
d
dx
p
(x)
du
dx
+ q(x)u .
Let L
0
: D
0
→ C
0
[a, b] be the restriction of L onto the space
D
0
:=
u ∈ C
2
[a, b] : u(a) = u(b) = 0
.
We can equip both D
0
and C
0
[a, b] with the L
2
norm. Integrating by parts, we can
check that L
0
is symmetric, i.e. (u, L
0
v
) = (L
0
u
, v) for all u, v ∈ D
0
. On the other hand,
considering L
0
on the sequence u
n
= n
−1
sin(πn(x − a)/(b − a)), we can check that L
0
is not bounded. We have not studied unbounded operators.
Eigenfunctions of the Sturm-Liouville problem are eigenvectors of L
0
. In order
to prove Theorem 14.1, we will show that L
0
is invertible and L
−1
0
coincides with the
restriction on C
0
[a, b] of a compact self-adjoint operator A : L
2
(a, b) → L
2
(a, b). The
Sturm-Liouville theorem (see Corollary 13.9) implies that eigenfunctions of A form
an orthonormal basis in L
2
(a, b). Moreover, we will see that all eigenfunctions of A
belong to D
0
and, consequently, L
0
have the same eigenfunctions as A.
79

Differential equation Lu = f
Lemma 14.2 If both u
1
and u
2
satisfy the equation Lu
= 0, i.e.
− (pu
0
)
0
+ qu = 0,
(14.1)
then
W
p
(u
1
, u
2
) = p(u
0
1
u
2
− u
1
u
0
2
)
is constant. Moreover, if W
p
(u
1
, u
2
) 6= 0 then u
1
and u
2
are linearly independent.
Proof:
Differentiating W
p
with respect to x and using pu
00
= −p
0
u
0
+ qu we obtain
W
0
p
=
p
0
(u
0
1
u
2
− u
1
u
0
2
) + p(u
00
1
u
2
− u
1
u
00
2
)
=
p
0
(u
0
1
u
2
− u
1
u
0
2
) + ((−p
0
u
0
1
+ qu
1
)u
2
− (−p
0
u
0
2
+ qu
2
)u
1
) = 0 .
Therefore W
p
is constant.
Suppose u
1
and u
2
are linearly dependant, then there are constants α
1
, α
2
such that
α
1
u
1
+ α
2
u
2
= 0 and at least one of the constants does not vanish. Suppose α
2
6= 0
(otherwise swap u
1
and u
2
). Then u
2
= −α
1
u
1
/α
2
and u
0
2
= −α
1
u
0
1
/α
2
. Substituting
these equalities into W
p
(u
1
, u
2
) we see that W
p
(u
1
, u
2
) = 0. Therefore W
p
(u
1
, u
2
) 6= 0
implies that u
1
, u
2
are linearly independent.
Lemma 14.3 The equation (14.1) has two linearly independent solutions, u
1
, u
2
∈
C
2
[a, b], such that u
1
(a) = u
2
(b) = 0.
Proof:
Let u
1
, u
2
be solutions of the Cauchy problems
−(pu
0
1
)
0
+ qu
1
= 0
u
1
(a) = 0, u
0
1
(a) = 1,
−(pu
0
2
)
0
+ qu
2
= 0
u
2
(b) = 0, u
0
2
(b) = 1 .
According to the theory of linear ordinary differential equations u
1
and u
2
exist, belong
to C
2
[a, b] and are unique.
Moreover, u
1
and u
2
are linearly independent. Indeed, suppose Lu = 0 for some
u
∈ C
2
[a, b] and u(a) = u(b) = 0. Then
0 = (Lu, u) =
Z
b
a
−(pu
0
)
0
u
+ qu
2
dx
(using definition of L)
=
p
(x)u
0
(x)u(x)
b
a
+
Z
b
a
p
(u
0
)
2
+ qu
2
dx (using integration by parts)
=
Z
b
a
p
(u
0
)
2
+ qu
2
dx
Since p > 0 on [a, b], we conclude that u
0
≡ 0. Then u(a) = u(b) = 0 implies u(x) = 0
for all x ∈ [a, b].
Consequently, as u
2
(b) = 0 and u
2
is not identically zero, u
2
(a) 6= 0 and so
W
p
(u
1
, u
2
) = p(a)(u
0
1
(a)u
2
(a) − u
1
(a)u
0
2
(a)) = p(a)u
0
1
(a)u
2
(a) 6= 0.
Therefore u
1
, u
2
are linearly independent by Lemma 14.2.
80

Lemma 14.4 If u
1
and u
2
are linearly independent solutions of the equation Lu
= 0
such that u
1
(a) = u
2
(b) = 0 and
G
(x, y) =
1
W
p
(u
1
, u
2
)
u
1
(x)u
2
(y),
a
≤ x < y ≤ b,
u
1
(y)u
2
(x),
a
≤ y ≤ x ≤ b,
then for any f
∈ C
0
[a, b] the function
u
(x) =
Z
b
a
G
(x, y) f (y) dy
belongs to C
2
[a, b], satisfies the equation Lu = f and the boundary conditions u(a) =
u
(b) = 0.
Proof:
The statement is proved by a direct substitution of
u
(x) =
u
2
(x)
W
p
(u
1
, u
2
)
Z
x
a
u
1
(y) f (y) dy +
u
1
(x)
W
p
(u
1
, u
2
)
Z
b
x
u
2
(y) f (y) dy
into the differential equation. Moreover, u
1
(a) = u
2
(b) = 0 implies u(a) = u(b) = 0.
Integral operator
Lemma 14.5 The operator A : L
2
(a, b) → L
2
(a, b) defined by
(A f )(x) =
Z
b
a
G
(x, y) f (y) dy .
is compact and self-adjoint. Moreover,
Range(A) is dense in L
2
(a, b), Ker A = { 0 },
and all eigenfunctions, Au
= µu, belong to C
2
[a, b] and satisfy u(a) = u(b) = 0.
Proof:
Since the kernel G is continuous, the operator A is compact by Proposition 13.4.
Moreover, G is real and symmetric and so A is self-adjoint. Lemma 14.4 implies the
range of A contains all functions from C
2
[a, b] such that u(a) = u(b) = 0. This set is
dense in L
2
(a, b).
Now suppose Au = 0 for some u ∈ L
2
[a, b]. Then for any v ∈ L
2
0 = (Au, v) = (u, Av) ,
which implies u = 0 because u is orthogonal to a dense set (the range of A). Thus
Ker(A) = {0}.
Finally, let u ∈ L
2
[a, b] be an eigenfunction of A, i.e., Au = µu. Since Ker(A) = {0},
µ 6= 0. So we can write u = µ
−1
Au
, which takes the form of the following integral
equation:
u
(x) = µ
−1
Z
b
a
G
(x, y)u(y) dy .
81

Obviously, |G(x, y)u(y)| ≤ kGk
∞
|u(y)| for all x, y ∈ [a, b]. Since G is continuous, the
Dominated Convergence Theorem implies that we can swap a limit x → x
0
and the
integration, and thus the integral in the right-hand-side is a continuous function of
x
. Consequently, u is continuous. For a continuous u the integral is in C
2
[a, b] and
satisfies the boundary conditions u(a) = u(b) = 0 due to Lemma 14.4. Thus u ∈ D
0
.
Therefore, the eigenfunctions of A belong to D
0
.
Proof of Theorem 14.1:
Since A : L
2
(a, b) → L
2
(a, b) is compact and self-adjoint,
Theorem 13.9 implies that its eigenvectors form an orthonormal basis in L
2
(a, b). If u
is an eigenfunction of A, then Lemma 14.5 implies that u ∈ C
2
[a, b] and u(a) = u(b) =
0. Moreover, Lemma 14.4 and Au = µu with µ 6= 0 imply that Lu = λ u with λ = µ
−1
.
Consequently, u is also an eigenfunction of the Sturm-Liouville problem.
Finally, suppose that u is an eigenfunction of the Sturn-Liuoville problem and ˜
u
is
another eigenfunction which corresponds to the same eigenvalue. Both eigenfunctions
satisfy the linear odinary differential equation L(u) = λ u and u(a) = ˜
u
(a) = 0. Then
˜
u
(x) = u(x) ˜
u
0
(a)/u
0
(a) due to uniqueness of the solution of the Cauchy problem. Thus
the eigenspace is one-dimensional.
Example: An application for Fourier series
Consider the Strum-Liouville problem
−
d
2
u
dx
2
= λ u,
u
(0) = u(1) = 0 .
It corresponds to the choice p = 1, q = 0. Theorem 14.1 implies that the normalised
eigenfunctions of this problem form an orthonormal basis in L
2
(0, 1). In this example
the eigenfunctions are easy to find:
1
√
2
sin kπx : k ∈ N
.
Consequently any function f ∈ L
2
(0, 1) can be written in the form
f
(x) =
∞
∑
k
=1
α
k
sin kπx
where
α
k
=
1
2
Z
1
0
f
(x) sin kπx dx .
The series converges in the L
2
norm.
82